=Paper=
{{Paper
|id=Vol-162/paper-7
|storemode=property
|title=Efficient algorithms for clone items detection
|pdfUrl=https://ceur-ws.org/Vol-162/paper7.pdf
|volume=Vol-162
|dblpUrl=https://dblp.org/rec/conf/cla/MedinaNR05
}}
==Efficient algorithms for clone items detection==
Efficient
Efficient algorithms
algorithms for
for clone
clone items
items detection
detection
Raoul Medina, Caroline Noyer, and Olivier Raynaud
Raoul Medina, Caroline Noyer and Olivier Raynaud
LIMOS - Université Blaise Pascal,
LIMOS -des
Campus universitaire Université
Cézeaux,Blaise Pascal,
Clermont-Ferrand, France
Campus universitaire des Cézeaux,
{medina, Clermont-Ferrand, France
raynaud}@isima.fr
{medina, raynaud}@isima.fr
Abstract. This paper presents efficient algorithms for clone items de-
tection in a binary relation. Best implementation of these algorithms has
O(|J|.||M||) time complexity, with J corresponding to the set of items of
the relation and ||M|| corresponding to the size of the relation. This re-
sult improves the previous algorithm given in [3] which is O(|J|2 .||M||).
Clone items have been introduced by Medina and Nourine to explain
why, sometimes, the number of rules of a minimum cover of a relation is
exponential with the number of items of the relation.
1 Introduction
The clone items notion has been introduced by Medina and Nourine in [3]. Aim
of their paper was to understand the combinatorial explosion that might arise
in a minimum basis of a relation. Most famous example of such exponential
minimum basis is given by Manilla and Rähiä in [2]. Medina and Nourine no-
ticed that in this example some items play symetrical role in the basis. Indeed,
for each rule containing a given item a in the antecedent there is a symetri-
cal rule where item a is replaced by item b. Such symetrical items are said to
be clone items. The clone notion is an equivalence relation and thus classes of
clone items can be found in a binary relation. In [3], the authors show how to
compute such clone classes and, from those classes, how to reduce the binary
relation in order to obtain a minimum basis with no clone items. The detection
algorithm as well as the reduction algorithm have both polynomial time com-
plexities. The obtained minimum basis is smaller than the original one (in the
case of the Mannila and Rähiä example it reduces to a single rule) and this later
can be reconstructed from the clone free minimum basis and the clone classes
information in polynomial time.
Recently, Gely et al. (in [1]) extended the notion of clone items (which are
defined on closed sets) to the notion of P-clone items (defined on pseudo-closed
sets) and to the notion of A-clone items (defined on what they call an ”atomized”
context). Their approach could be generalized to any generation problem where,
from a given relation, one wants to compute a (potentially exponential) collection
of sets verifying a property (e.g. the sets must be closed, or the sets must be
ideals, or the sets are pseudo-closed sets, etc...). The idea is then to reduce the
combinatorial explosion of the wanted collection by representing it by a clone
free collection and the classes of clone items of this collection. The problem is
Radim Bělohlávek, Václav Snášel (Eds.): CLA 2005, pp. 70–81, ISBN 80–248–0863–3.
Efficient algorithms for clone items detection 71
then to be able to compute the clone classes of the (potentially exponential)
collection without generating its sets. To solve this problem in polynomial time,
the general method could be as follows:
– Let M be the (potentially exponential) collection of sets verifying a property
over a given binary relation R. Compute in polynomial time a collection M0
such that:
1. the size of M0 is polynomial in the size of R,
2. items a and b are clone in M if and only if they are clone items in M0 .
– Detect the clone classes in M0 .
Our paper focuses on the detection phase of this approach. The clone classes
computation algorithm given in [3] has an O(|J|2 .||M||) time complexity, where
J is the set
P of items and ||M|| is the size of the input collection. In other words,
||M|| = m∈M |m|. In this paper, we present different algorithms to solve the
clone classes computation. The best complexity we obtain is in O(|J|.||M||).
This paper is organized as follows: section 2 formally defines the problem in
terms of collection of sets and introduces the corresponding Abstract Data Type
which will be used by our algorithms. Section 3 describes three computation
strategies and the corresponding time complexities are studied. Section 4 shows
how to take advantage of those algorithms in order to compute the clone items
classes as defined in [3].
2 General context and definitions
In this section, we first formally define the studied problem. Then we present
the Abstract Data Structure called Map used in our algorithms and discuss on
its possible implementations.
2.1 Clone items in a Sets Collection
Let J be a set of items {x1 , ..., x|J| } and M a sets collection on J. We denote
by ϕa,b : 2J → 2J the mapping which associates to any subset of J its image
by swapping items a and b. More formally :
(m \ {a}) ∪ {b} if b 6∈ m and a ∈ m
ϕa,b (m) = (m \ {b}) ∪ {a} if a 6∈ m and b ∈ m
m otherwise
Definition 1. Let M be a collection of sets defined on J. We say that items a
and b are clone items in M if and only if ∀m ∈ M, ϕa,b (m) ∈ M.
The clone items concept is a binary one. To the question ”are a and b clone
items ?”, only the answer ”true” or ”false” is possible. It could be interesting
to have more precisions when the negative response is given. Are a and b very
far from being clone items or why are not they clone ? For this purpose, we
introduce a measure to qualify the clone property. This measure will represent
72 Raoul Medina, Caroline Noyer, Olivier Raynaud
a distance between two items a and b, based on the definition of the clone items
property. This distance is exactly the number of elements m of M which do not
have an image in M when applying the swapping function ϕa,b (m). When this
distance is zero, a and b are clone items. The greater the distance is, the farther
a and b are to be clone. More formally:
Definition 2. Let M be a sets collection on J and let (a, b) in J 2 . We call
distance between a and b, denoted by dM (a, b), the mapping :
J2 → N
dM (a, b) → {m ∈ M | ϕa,b (m) 6∈ M}
Thanks to definition 2, clone items could be charaterized as follows :
Proposition 1. Let M be a sets collection on J and (a, b) in J 2 , a and b are
clone items if and only if dM (a, b) = 0.
The problem we study in this paper is the computation of distances between
all pair of items of J:
Problem 1 (Distance).
Data : a sets collection M on J;
Result : the matrix dM .
Here, we present the main property on which rely our algorithms. It charac-
terizes a couple (m, m0 ) of the sets collection M such that m = ϕa,b (m0 ).
Proposition 2. Let M be a sets collection defined over J, m and m0 two dis-
tinct sets of M and (a, b) two items of J such that a ∈ m and b ∈ m0 . Then the
following assertions are equivalent:
1. m = ϕa,b (m0 )
2. m0 = ϕa,b (m)
3. |m| = |m0 | and m \ m0 = {a} and m0 \ m = {b}
4. m \ {a} = m0 \ {b}
This property states that two sets m and m0 are their respective images by
the swapping function ϕ if and only if they have same size t and share t − 1
items. This property follows directly from the definition of the ϕ mapping.
2.2 Abstract Data Type : Key Mapping
Interface. We use a Map abstract data type similar to the Map interface of
Java language. This data structure maps keys to values. In our case, the keys
are the sets of the collection. The values mapped by the keys depend on the
algorithm.
This abstract data type supplies the following operators:
– new() operator: creates a Map object and returns an empty map.
Efficient algorithms for clone items detection 73
– get(e) operator: returns the value associated to the key e if this key maps
a value, or Nil otherwise.
– put(e,value) operator: inserts set e in the map and associates value to it.
Time complexities of those operators deeply rely on the data structure used
for the implementation of the Map data type. We propose an implementation
which takes advantage of the type of the keys, i.e. sets. To implement the Map
type we propose a lexicographic tree: itemsets are represented by branches of
the tree.
Implementation: lexicographic tree. We first give a formal definition of a
lexicographic tree corresponding to a sets collection.
Definition 3. Let M be a sets collection defined over J, with a total order on
J denoted by