Efficient
     Efficient algorithms
               algorithms for
                          for clone
                              clone items
                                    items detection
                                          detection

               Raoul Medina, Caroline Noyer, and Olivier Raynaud
               Raoul Medina, Caroline Noyer and Olivier Raynaud
                        LIMOS - Université Blaise Pascal,
                         LIMOS -des
             Campus universitaire  Université
                                      Cézeaux,Blaise Pascal,
                                                Clermont-Ferrand, France
             Campus universitaire des Cézeaux,
                           {medina,             Clermont-Ferrand, France
                                      raynaud}@isima.fr
                           {medina, raynaud}@isima.fr


        Abstract. This paper presents efficient algorithms for clone items de-
        tection in a binary relation. Best implementation of these algorithms has
        O(|J|.||M||) time complexity, with J corresponding to the set of items of
        the relation and ||M|| corresponding to the size of the relation. This re-
        sult improves the previous algorithm given in [3] which is O(|J|2 .||M||).
        Clone items have been introduced by Medina and Nourine to explain
        why, sometimes, the number of rules of a minimum cover of a relation is
        exponential with the number of items of the relation.


 1    Introduction
 The clone items notion has been introduced by Medina and Nourine in [3]. Aim
 of their paper was to understand the combinatorial explosion that might arise
 in a minimum basis of a relation. Most famous example of such exponential
 minimum basis is given by Manilla and Rähiä in [2]. Medina and Nourine no-
 ticed that in this example some items play symetrical role in the basis. Indeed,
 for each rule containing a given item a in the antecedent there is a symetri-
 cal rule where item a is replaced by item b. Such symetrical items are said to
 be clone items. The clone notion is an equivalence relation and thus classes of
 clone items can be found in a binary relation. In [3], the authors show how to
 compute such clone classes and, from those classes, how to reduce the binary
 relation in order to obtain a minimum basis with no clone items. The detection
 algorithm as well as the reduction algorithm have both polynomial time com-
 plexities. The obtained minimum basis is smaller than the original one (in the
 case of the Mannila and Rähiä example it reduces to a single rule) and this later
 can be reconstructed from the clone free minimum basis and the clone classes
 information in polynomial time.
     Recently, Gely et al. (in [1]) extended the notion of clone items (which are
 defined on closed sets) to the notion of P-clone items (defined on pseudo-closed
 sets) and to the notion of A-clone items (defined on what they call an ”atomized”
 context). Their approach could be generalized to any generation problem where,
 from a given relation, one wants to compute a (potentially exponential) collection
 of sets verifying a property (e.g. the sets must be closed, or the sets must be
 ideals, or the sets are pseudo-closed sets, etc...). The idea is then to reduce the
 combinatorial explosion of the wanted collection by representing it by a clone
 free collection and the classes of clone items of this collection. The problem is


Radim Bělohlávek, Václav Snášel (Eds.): CLA 2005, pp. 70–81, ISBN 80–248–0863–3.
                               Efficient algorithms for clone items detection   71


then to be able to compute the clone classes of the (potentially exponential)
collection without generating its sets. To solve this problem in polynomial time,
the general method could be as follows:
 – Let M be the (potentially exponential) collection of sets verifying a property
   over a given binary relation R. Compute in polynomial time a collection M0
   such that:
    1. the size of M0 is polynomial in the size of R,
    2. items a and b are clone in M if and only if they are clone items in M0 .
 – Detect the clone classes in M0 .
    Our paper focuses on the detection phase of this approach. The clone classes
computation algorithm given in [3] has an O(|J|2 .||M||) time complexity, where
J is the set
         P of items and ||M|| is the size of the input collection. In other words,
||M|| = m∈M |m|. In this paper, we present different algorithms to solve the
clone classes computation. The best complexity we obtain is in O(|J|.||M||).
    This paper is organized as follows: section 2 formally defines the problem in
terms of collection of sets and introduces the corresponding Abstract Data Type
which will be used by our algorithms. Section 3 describes three computation
strategies and the corresponding time complexities are studied. Section 4 shows
how to take advantage of those algorithms in order to compute the clone items
classes as defined in [3].


2     General context and definitions
In this section, we first formally define the studied problem. Then we present
the Abstract Data Structure called Map used in our algorithms and discuss on
its possible implementations.

2.1    Clone items in a Sets Collection
Let J be a set of items {x1 , ..., x|J| } and M a sets collection on J. We denote
by ϕa,b : 2J → 2J the mapping which associates to any subset of J its image
by swapping items a and b. More formally :
                         
                          (m \ {a}) ∪ {b} if b 6∈ m and a ∈ m
              ϕa,b (m) = (m \ {b}) ∪ {a} if a 6∈ m and b ∈ m
                           m                   otherwise
                         

Definition 1. Let M be a collection of sets defined on J. We say that items a
and b are clone items in M if and only if ∀m ∈ M, ϕa,b (m) ∈ M.
    The clone items concept is a binary one. To the question ”are a and b clone
items ?”, only the answer ”true” or ”false” is possible. It could be interesting
to have more precisions when the negative response is given. Are a and b very
far from being clone items or why are not they clone ? For this purpose, we
introduce a measure to qualify the clone property. This measure will represent
72        Raoul Medina, Caroline Noyer, Olivier Raynaud


a distance between two items a and b, based on the definition of the clone items
property. This distance is exactly the number of elements m of M which do not
have an image in M when applying the swapping function ϕa,b (m). When this
distance is zero, a and b are clone items. The greater the distance is, the farther
a and b are to be clone. More formally:

Definition 2. Let M be a sets collection on J and let (a, b) in J 2 . We call
distance between a and b, denoted by dM (a, b), the mapping :

                             J2 → N
                       dM (a, b) → {m ∈ M | ϕa,b (m) 6∈ M}


      Thanks to definition 2, clone items could be charaterized as follows :

Proposition 1. Let M be a sets collection on J and (a, b) in J 2 , a and b are
clone items if and only if dM (a, b) = 0.
    The problem we study in this paper is the computation of distances between
all pair of items of J:
Problem 1 (Distance).
    Data : a sets collection M on J;
    Result : the matrix dM .
    Here, we present the main property on which rely our algorithms. It charac-
terizes a couple (m, m0 ) of the sets collection M such that m = ϕa,b (m0 ).

Proposition 2. Let M be a sets collection defined over J, m and m0 two dis-
tinct sets of M and (a, b) two items of J such that a ∈ m and b ∈ m0 . Then the
following assertions are equivalent:
1. m = ϕa,b (m0 )
2. m0 = ϕa,b (m)
3. |m| = |m0 | and m \ m0 = {a} and m0 \ m = {b}
4. m \ {a} = m0 \ {b}
   This property states that two sets m and m0 are their respective images by
the swapping function ϕ if and only if they have same size t and share t − 1
items. This property follows directly from the definition of the ϕ mapping.

2.2     Abstract Data Type : Key Mapping
Interface. We use a Map abstract data type similar to the Map interface of
Java language. This data structure maps keys to values. In our case, the keys
are the sets of the collection. The values mapped by the keys depend on the
algorithm.
   This abstract data type supplies the following operators:

 – new() operator: creates a Map object and returns an empty map.
                                                                                                                                                                                                      Efficient algorithms for clone items detection                                                                                                                                                                                                                                                                                                                                                                                                                                                                73


 – get(e) operator: returns the value associated to the key e if this key maps
   a value, or Nil otherwise.
 – put(e,value) operator: inserts set e in the map and associates value to it.

    Time complexities of those operators deeply rely on the data structure used
for the implementation of the Map data type. We propose an implementation
which takes advantage of the type of the keys, i.e. sets. To implement the Map
type we propose a lexicographic tree: itemsets are represented by branches of
the tree.

Implementation: lexicographic tree. We first give a formal definition of a
lexicographic tree corresponding to a sets collection.

Definition 3. Let M be a sets collection defined over J, with a total order on
J denoted by <J . A unique lexicographic tree is associated to M such that:

 – Each edge of the tree is labeled with an element of J;
 – To each marked node of the tree corresponds a set of M;
 – To each set m of M correspond a unique path in the tree (starting from root
   and ending with a marked node) such that the union of labels in this path
   corresponds exactly to the set m.
 – For any path from the root to any node, the order of the successive labels
   respects the order defined by <J ;
 – The order of edges leaving a node respects the order defined by < J .

   Figure 1 gives an example of collection and its associated lexicographic order.

                                                                                                                                                                  
                                                                                                                                                                    
                                                                                                                                                                             
                                                                                                                                                                                  
                                                                                                                                                                                             
                                                                                                                                                                                                  
                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                  !"
                                                                                                                                                                                                                                                                                     "
                                                                                                                                                                                                                                                                                      !              
                                                                                                                                                                                                                                                                                                       
                                                                                                                                                                                                                                                                                                                                            !"
                                                                                                                                                                                                                                                                                                                                               "
                                                                                                                                                                                                                                                                                                                                                !              
                                                                                                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                                                                                                                !"
                                                                                                                                                                                                                                                                                                                                                                                   "
                                                                                                                                                                                                                                                                                                                                                                                    !              
                                                                                                                                                                                                                                                                                                                                                                                                                     )('*)(')' "!"!!   *
                                                                                                                                                                                                                                                                                                                                                                                                                               *                                    )('*
                                                                                                                                                                                                                                                                                                                                                                                                                                                                         )('*                       )('*
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         *
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          )('*                                        )('*
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           *
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            )('*                                                               )*
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  *
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   )*                               )*
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       *
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        ))) *    )*
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    )*          *)*)*)
                                                                                                                                                                                   
                                                                                                                                                                                                     
                                                                                                                                                                                                         
                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                       ! 
                                                                                                                                                                                                                                                                                          " ! 
                                                                                                                                                                                                                                                                                              "            
                                                                                                                                                                                                                                                                                                              
                                                                                                                                                                                                                                                                                                                                                 ! 
                                                                                                                                                                                                                                                                                                                                                    " ! 
                                                                                                                                                                                                                                                                                                                                                        "            
                                                                                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                                                                                                     ! 
                                                                                                                                                                                                                                                                                                                                                                                        " ! 
                                                                                                                                                                                                                                                                                                                                                                                            "              
                                                                                                                                                                                                                                                                                                                                                                                                              
                                                                                                                                                                                                                                                                                                                                                                                                                               ( * ' ) ( *              " ! "            ' ) ( ' ) ( 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         *                    ' ) ( ' ) ( 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          *                                     ' ) ( ' ) ( 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            *                                                        ) ) 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         *                         e*     
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           *          ) ) 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          *             *)*)*)
                                                 
                                                                        
                                                                                                 
                                                                                                                       
                                                                                                                                                                      
                                                                                                                                                                             
                                                                                                                                                                                          
                                                                                                                                                                                             
                                                                                                                                                                                                            
                                                                                                                                                                                                               
                                                                                                                                                                                                                 
                                                                                                                                                                                                                  #
                                                                                                                                                                                                                                       a
                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                           # 
                                                                                                                                                                                                                                                              $
                                                                                                                                                                                                                                                                                                !"
                                                                                                                                                                                                                                                                                                  "
                                                                                                                                                                                                                                                                                                   !              
                                                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                                                          # $
                                                                                                                                                                                                                                                                                                                                                          !"
                                                                                                                                                                                                                                                                                                                                                            "
                                                                                                                                                                                                                                                                                                                                                             !              
                                                                                                                                                                                                                                                                                                                                                                              
                                                                                                                                                                                                                                                                                                                                                                                             !"
                                                                                                                                                                                                                                                                                                                                                                                                "
                                                                                                                                                                                                                                                                                                                                                                                                 !   b          
                                                                                                                                                                                                                                                                                                                                                                                                                                           *)('*)(' "!"!  *                          )('*
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               ' ) (                        )('*
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                *
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 ' ) (              c*                      )('*
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  ' ) (                                                  )*
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             *
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              )                              )*
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               *
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                )           )*
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              *
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               )               
                          
                            
                                                      
                                                          
                                                          
                                                                             
                                                                                
                                                                                                      
                                                                                                          
                                                                                                                          
                                                                                                                                                 #
                                                                                                                                                                                                                   
                                                                                                                                                                                                                    #                                           # 
                                                                                                                                                                                                                                                                  $ # 
                                                                                                                                                                                                                                                                      $                                                         # $ # $                                                                                                                                                                                                  
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         , 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           ,                                                , 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              - , 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  -                                             , - , -
                                                          
                                                                               
                                                                                    
                                                                                            b                      
                                                                                                                                                    d #                                                     #                                                #$
                                                                                                                                                                                                                                                                          $
                                                                                                                                                                                                                                                                           #$                                                          $#$#                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             , ,
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                ,                                                  , 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      - ,-
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          -
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           ,-                                          , - -,-,, d {e}
                                                                                                            
                                                                                                                                                                                                                     
                                                                                                                                                                                                                       #                                                     #                                         c$#$#                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 ,                                                           ,
                                    
                                       
                                                              
                                                                                      
                                                                                                             
                                                                                                                                     
                                                                                                                                           %%%                                                                 % 
                                                                                                                                                                                                                           #                                                   # 
                                                                                                                                                                                                                                                                                 $                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   ,                                                           , 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 -  
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          ++ --,  {c,d}
                                                                                                                                       {a,d} %%                                                                                                                                                                                                                                                                                                                                                                                                                                                                           +++ e
                                               c                                                                                                                                                                          %&%& d                                                                                                                                                                                                                                                                                                                                                                                     ++ 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                 &&
                                                                                                                                                                                                                                    &&& e                                                                                                                                                                                                                                                                                                                                                                          {c,d,e}
                       {a,b,c}                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                                          {b,c,d,e}


Fig. 1. The lexicographic tree corresponding to the collection {abc, ad, bcde, cd, cde, e},
with J = {a, b, c, d, e} and <J being the alphabetical order of the items. Circled nodes
are the marked nodes corresponding to sets of the collection.


   A boolean can be associated to a node of the tree in order to indicate if the
node is marked (i.e. corresponds to a set of the collection and thus to a key) or
74       Raoul Medina, Caroline Noyer, Olivier Raynaud


not. Any field type can be associated to the node for storing the value associated
to the key (on marked nodes only). There are two ways of implementing the list
of children of a node in the lexicographic tree:

 – Using a List structure, each entry of the list containing the label of the
   associated edge and a reference to the child;
 – Using an Array structure being indexed by the labels and the entries con-
   taining either NIL or reference to the child.

     Complexities of the put and get operators rely on the chosen implementation.

Proposition 3. If the lexicographic tree is implemented with Lists then:

 – The put(m,value) operator as an O(|J|) time complexity;
 – The get(m) operator as an O(|J|) time complexity;

   Access complexity is due to the fact that an item appears only once in a set.
Cost of a node creation is done in constant time.

Proposition 4. If the lexicographic tree is implemented with Arrays then:

 – The put(m,value) operator as an O(|m| × |J|) time complexity;
 – The get(m) operator as an O(|m|) time complexity;

    Acces complexity is due to the fact that we have direct access to a child
labeled by a given item. Creation of a node of the tree costs |J| since we need
to create and initialize a |J| sized array.


3     Strategies and algorithms

In this section, we study three different strategies for solving the Distance prob-
lem. All strategies rely on the same basic idea: first the distance matrix is ini-
tialized with the maximum possible values for each distance dM (a, b). Then,
each time the algorithm finds m and m0 such that m = ϕa,b (m0 ), the distance
dM (a, b) is decreased by one. Main difference between the three strategies is the
way they detect that m = ϕa,b (m0 ).
    The first strategy is the one given in [3]. We call it set existence checking.
Main idea of the algorithm is, given m ∈ M, to compute all possible sets ϕa,b (m)
and then check if these sets belong to the collection. Second strategy is called
ϕa,b relation checking. Its principle is, for each pair (m, m0 ), to check if there
exist items a and b such that m = ϕa,b (m0 ). This test is done using Property 2.
Third strategy is called classes computation. Based on Property 2, it computes
classes C of itemsets such that for any pair (m, m0 ) of C, there exist items a and
b such that m = ϕa,b (m0 ).
    We first discuss about the initialization of the distance matrix since this is a
common ground for all strategies.
                                Efficient algorithms for clone items detection    75


3.1   Discussion on the distance matrix
Since the notion of distance dM (a, b) is a symetrical one, the distance matrix
is also symmetrical. We thus choose to represent it by a triangular matrix such
that:                       
                              dM (a, a) = 0
                              dM (a, b) = dM (b, a)
   In this section we discuss the different strategies for the initialization and the
update of the distance matrix, as well as their respective costs.
   Let M = Mab + Mab + Mab + Mab , with:
           - Mab : the sets m of M such that a ∈ m and b 6∈ m
           - Mab : the sets m of M such that a 6∈ m and b ∈ m
           - Mab : the sets m of M such that a ∈ m and b ∈ m
           - Mab : the sets m of M such that a 6∈ m and b 6∈ m.

Incrementation or decrementation strategy ? There are two ways of com-
puting the distance matrix:
 – Either by first initializing all distances to 0 and then by increasing by 1 the
   distance dM (a, b) each time we find m ∈ M such that ϕa,b (m) 6∈ M,
 – or by first initializing the distances by a maximal value and then decreasing
   by 1 the distance dM (a, b) each time we find m and m0 ∈ M such that
   m = ϕa,b (m0 ).
    What is the maximal value that can be taken by dM (a, b) ? Clearly the answer
is |Mab | + |Mab |. Indeed, for any item m ∈ Mab ∪ Mab we have m = ϕa,b (m)
and thus m cannot increase the distance between a and b. This represents the
maximum number of basic operations (either incrementation or decrementation)
needed to compute the distance matrix in the worst case. Thus, whatever strat-
egy we choose, the number of basic operations will be the same in the worst
case.
    What is the best time complexity for both strategies ? We consider that the
basic operations can be done in O(1). Thus the overall complexity will be:
                               XX
                            O(         |Mab | + |Mab |).
                               a∈J b∈J

Now, consider m ∈ M. In the worst case, m will be taken into account in at
most |m| × |J \ m| distances dM (a, b). Indeed, for m to be taken into account in
dM (a, b) either a or b belongs to m but not both. Thus, we have:
                  XX                            X
               O(         |Mab | + |Mab |) = O(      |m| × |J \ m|).
                 a∈J b∈J                        m∈M

This can be rewritten as follows:
         X                            X                X
     O(      |m| × (|J| − |m|)) = O((   |m| × |J|) − (   |m| × |m|)).
        m∈M                              m∈M                 m∈M
76        Raoul Medina, Caroline Noyer, Olivier Raynaud


And since |m| × |m| is lesser or equal than |m| × |J|, we obtain the worst case
complexity:              X
                     O(       |m| × |J|) = O(|J| × ||M||).
                          m∈M

    Thus, whatever update strategy is chosen, time complexity cannot be less
than O(|J| × ||M||) (upon the hypothesis that the basic operations increment
or decrement by 1).
    In this paper we choose to adopt the decrementation strategy since our al-
gorithms are based on Property 2. We now discuss the complexity of the initial-
ization of the distance matrix.


Initializing the distance matrix. We initialize each distance dM (a, b) with
the maximal value possible, i.e. with |Mab | + |Mab |. Initially, the distances are
equal to 0. This can be done in O(|J| × |J|).
   Then, for each m ∈ M we increment by 1 the distances dM (a, b), with a ∈ m
and b 6∈ m. As shown in previous subsection, this can be done in O(|J| × ||M||).


Algorithm 1: InitDistance(M)
      Data : A sets collection M defined over J.
      Result : The distance matrix dM such that for all a and b in J 2 we have
               dM (a, b) = |Mab | + |Mab |.
      begin
         foreach a ∈ J do
             foreach b ∈ J do
                 dM (a, b) = 0;

         foreach m ∈ M do
             foreach a ∈ m do
                 foreach b 6∈ m do
                     dM (a, b) + +;

         return dM ;
      end


3.2     Set Existence Checking Strategy
For each set m of M and for all pair (a, b) of J 2 , we check if ϕa,b (m) belongs to
M. In order to check the existence of ϕa,b (m), we choose to store the collection
M in the Map structure presented before. The sets of M are the keys while
their associated value is "present".
    Beware that since a distance dM (a, b) is initialized with |Mab | + |Mab |, this
distance should be decremented only when m 6= ϕa,b (m). Indeed, if m = ϕa,b (m)
then m 6∈ Mab ∪ Mab .
                                 Efficient algorithms for clone items detection   77


Algorithm 2: ComputeDistance(M) : Set Existence Checking Strategy
      Data : A sets collection M defined over J.
      Result : The distance matrix dM .
      begin
         dM = InitDistance(M);
         T = new Map();
         foreach m ∈ M do
             T .put(m,”present”);

 1       foreach m ∈ M do
 2           foreach couple(a, b) ∈ J 2 do
                 if T .get(m) 6= N IL and m 6= ϕa,b (m) then
                    dM (a, b) − −;

         return dM ;
      end


Proposition 5. The Set Existence Checking Strategy has an O(|J|2 × ||M||)
time complexity.
Proof. Initialization of the matrix is done in O(|J| × ||M||). We suppose that
the T Map structure
                  P is implemented using Arrays. Thus, the initialization of
T is done in O( m∈M |m| × |J|) = O(|J| × ||M||). Loop on line 1 does |M|
iterations while loop of line 2 does |J|2 iterations. The ϕa,b (m) operation as well
as the test m 6= ϕa,b (m) and the get(m) P    operation can be done in O(|m|) time
complexity. Overall complexity is thus, O( m∈M |J|2 × |m|) = O(|J|2 × ||M||).

    This strategy is a slighter improvement of the one presented in [3]. Note that
it can be improved a little more by choosing a in m and b in J \ m.

3.3     ϕa,b Relation Checking Strategy
For any pair of elements (m, m0 ) of M, we check if there exist a and b such
that m = ϕa,b (m0 ). According to Property 2, items a and b exist if and only
if m and m0 have same size and share |m| − 1 items. In this case, dM (a, b) is
decremented by 1.
Proposition 6. The ϕa,b relation checking strategy has an O((|J| + |M|) ×
||M||) time complexity.
Proof. Initialisation of the matrix is done in O(|J| × ||M||). External loop (line
1) does |M|2 iterations. All tests of line 2 can be done in O(|m|) provided that
the sets m and m0 are stored sorted according to <J . The complexity of the loop
is in O(|M| × ||M||). Overall complexity is thus in O((|J| + |M|) × ||M||). 

3.4     Classes Computation Strategy
This strategy also relies on Property 2. Let consider m, m0 and m00 be sets of
M such that m = ϕa,b (m0 ) and m = ϕa,c (m00 ). Then, according to Property
78       Raoul Medina, Caroline Noyer, Olivier Raynaud


Algorithm 3: ComputeDistance(M): ϕa,b Relation Checking Strategy
   Data : A set collections M defined over J.
   Result : The distance matrix dM .
   begin
      dM = InitDistance(M);
 1    foreach (m, m0 ) ∈ M2 do
 2        if |m| = |m0 | and |m \ m0 | = 1 and |m0 \ m| = 1 then
              dM (m \ m0 , m0 \ m) − −;

        return dM ;
     end


2 we have m \ {a} = m0 \ {b} = m00 \ {c}. And thus, m0 = ϕb,c (m00 ). Idea of
the algorithm is to compute classes of sets mi of M having |mi | − 1 common
items. Thus, a class C can be represented by the set of common items and we
memorize in a set Union all the extra items xi which are not common. In the
Map structure we use, the set C will be the key while the set Union will be the
value associated to the key.
    Then, for any m ∈ C and for any (a, b) ∈ Union, we know that according to
Property 2 we have ϕa,b (m) ∈ M and m 6= ϕa,b (m). And thus, dM (a, b) has to
be decremented. Note that a set m can belong to at most |m| classes.
    The algorithm is quite straightforward using the Map structure. For all sets
m of M we insert each of its |m| subsets of size |m| − 1 in the Map structure.
If the key was already present, we just append the extra item of m to the set
Union and update all the necessary entries in the distance matrix. Otherwise, a
new key m \ {x} is present in the Map structure and its associated Union value
is initialized with {x}.


Proposition 7. Classes computation strategy has O(|J| × ||M||) time complex-
ity.
Proof. Initialization of the distance matrix is done in O(|J|×||M||). We suppose
that the Map structure is implemented using a lexicographic tree with Lists. The
line 1 loop does |M| iterations. Loop in line 2 does |m| iterations. In line 3, the
retrieval is done in O(|J|). The loop in line 4 does at most |J| iterations and
the update of the matrix takes constant time.PThe insertion of line 5 is done in
O(|J|). Thus, the overall complexity is in O( m∈M |m| × |J|) = O(|J| × ||M||).

     Let us illustrate Algorithm 4 with an example. The considered collection
is M = {m1 = {a, e, f, h}, m2 = {b, e, f, h}, m3 = {b, d, f, h}}. The resulting
lexicographic tree obtained with Algorithm 4 is shown in Figure 2.
     From this tree, we conclude that m2 and m3 share the items {b, f, h} and that
m2 = ϕd,e (m3 ). Thus, dM (d, e) should be decremented. For the same reason,
the distance dM (a, b) should also be decremented since m1 = ϕa,b (m2 ) (they
share the items {e, f, h}).
                                Efficient algorithms for clone items detection   79


Algorithm 4: ComputeDistance(M): Classes Computation Strategy
   Data : A set collections M defined over J.
   Result : The distance matrix dM .
   begin
      dM = InitDistance(M);
      T = new Map();
 1    foreach m ∈ M do
 2        foreach x ∈ m do
              C = m \ {x}
 3            Union = T .get(C)
              if Union 6= Nil then
 4                foreach y ∈Union do
                      dM (x, y) − −;
                    Union = Union ∪{x};
                    T .put(C, U nion);
                else
 5                  T .put(C, {x})


         return dM ;
      end


3.5     Memory usage

In this section we discuss the space complexity required by each strategy and
show how to reduce the memory usage when possible.
     First, it is obvious that the distance matrix should be present in memory. It
requires O(|J|2 ) memory space.
     Concerning the Set Existence Checking Strategy, a lexicographic tree imple-
mented with arrays is used. The number of nodes is clearly bounded by ||M||.
And since each node requires a |J| sized array, this strategy uses O(|J| × ||M||)
memory space.
     For the ϕa,b Relation Checking Strategy, no lexicographic tree is used. How-
ever, the collection M needs to be present in memory. Thus, this strategy re-
quires O(||M||) memory space.
     Now, for the Classes Computation Strategy, the used lexicographic tree is
implemented using lists. Such implementation requires O(||M||) memory space.
But M is not the collection stored in the tree. Indeed, for each m ∈ M, we store
its |m| subsets of size |m| − 1. And since |m| is bounded by |J| we conclude that
this strategy requires O(|J| × ||M||) memory space.
     The conclusion seems to be that whatever strategy is used, at least O(||M||)
memory space will be needed. But one can notice that, according to Property
2, not all sets in M need to be present in memory at the same time. Indeed,
if m = ϕa,b (m0 ), then m and m0 have same size. The idea is then to do a
partitionning of M according to the size of the sets. Thus, only sets of same
size need to be present in memory at the same time. Computations done for
80      Raoul Medina, Caroline Noyer, Olivier Raynaud


                                                 a                               e
                                                             b           d

                                 e                                   f           f         f
                                     f          d        e

                 f               h         f         f           h           h       h
                       h                         h                                             h

           (h)   (f)       (e)       (h)       (f)   (h)         (f)     (d,e)       (b)   (a,b)


Fig. 2. The lexicographic tree corresponding to the collection M                    =
{{a, e, f, h}, {b, e, f, h}, {b, d, f, h}}. Circled nodes correspond to sets of M. The
Union value associated to these node is presented in parenthesis.


those sets is totally independent of computations done for sets with a different
size. Note that the partitionning of M can be done in O(||M||) time complexity,
with a single scan of the collection. Thus, the partitionning does not change the
overall complexity of the different strategies.
    Another remark is that since computations by size are totally independent,
one could easily implement a distributed version of the strategies. This could
eventually speed up the computation of the distance matrix. But note that in this
case, the distance matrix should also be distributed. Best solution should be to
initialize local distance matrices with the local partition. After local computation
(for sets with same size), all the distance matrices should be merged together
(by adding all the obtained values) in order to obtain the global distance matrix.


4    Clone classes computation

The problem stated in [1] and in [3] is the computation of clone classes. We thus
give the algorithm which computes clone classes from a distance matrix.
     First, let us recall that two items a and b are clone if and only if dM (a, b) =
0. Since, the clone relation is an equivalence relation, it defines a partition of the
set J.
     Principle of our algorithm 5 is the following. Let a ∈ J be an item which has
still not been assigned to a class. We then search all remaining items b which
distance with a is null. All those items will form a clone class with a and thus
are removed from the list of items which are not assigned to a class. The class
of a is then stored in a list L. Total complexity of the clone classes computation
is in O(|J| × ||M||) time and space complexity.


5    Conclusion

We have seen that if the problem is the computation of the distance matrix and
only basic incrementation or decrementation by 1 are allowed, then the minimal
                                 Efficient algorithms for clone items detection      81


Algorithm 5: ComputeCloneClasses(M)
   Data : A sets collection M defined over J.
   Result : The list L of clone classes.
   begin
      dM = ComputeDistance(M);
      L = ∅; temp = J;
      while temp 6= ∅ do
 1       foreach a ∈ temp do
              la = newList(); la = la ∪ {a}; temp = temp \ {a};
 2            foreach b ∈ temp do
 3                if dM (a, b) = 0 then
 4                    la = la ∪ {b};
 5                    temp = temp \ {b};

               L = L + la ;
      return L;
   end


time complexity for the update of the matrix is in O(|J| × ||M||). Thus, under
this hypothesis, our algorithm is optimal. Figure 3 gives the different time and
space complexities obtained with our algorithms.


         Algorithm                    Time complexity Space complexity
         Set Existence Checking         |J|2 × ||M||     |J| × ||M||
         ϕa,b Relation Checking     (|J| + |M|) × ||M||     ||M||
         Classes Computation             |J| × ||M||     |J| × ||M||
         Optimal matrix computation      |J| × ||M||          ?


              Fig. 3. Complexities of algorithms presented in the paper.


    An open question is to know whether or not the distance matrix could be
incremented (or decremented) by more than 1 at each step. This could be the
only way of improving our algorithm. Another open question is to know if there
is a more efficient algorithm to compute the clone classes (for instance without
computing the distance matrix). Those are two questions we are investigating.

References
1. A. Gely, R. Medina, L. Nourine, and Y. Renaud. Uncovering and reducing hidden
   combinatorics in guigues-duquenne covers. In ICFCA’05, 2005.
2. H. Manilla and K.J. Rähiä. On the complexity of inferring functionnal dependencies.
   Discret Applied Mathematics, 40(2):237–243, 1992.
3. R. Medina and L. Nourine. Clone items: a pre-processing information for knowledge
   discovery. submitted.