=Paper=
{{Paper
|id=Vol-433/paper-16
|storemode=property
|title=GARM: Generalized Association Rule Mining
|pdfUrl=https://ceur-ws.org/Vol-433/paper12.pdf
|volume=Vol-433
}}
==GARM: Generalized Association Rule Mining==
GARM: Generalized Association Rule Mining
T. Hamrouni1,2 , S. Ben Yahia1 and E. Mephu Nguifo2
1
Department of Computer Science, Faculty of Sciences of Tunis, Tunis, Tunisia.
{tarek.hamrouni, sadok.benyahia}@fst.rnu.tn
2
CRIL-CNRS, IUT de Lens, Lens, France.
{hamrouni, mephu}@cril.univ-artois.fr
Abstract. A thorough scrutiny of the literature dedicated to association rule min-
ing highlights that a determined effort focused so far on mining the co-occurrence
relations between items, i.e., conjunctive patterns. In this respect, disjunctive pat-
terns presenting knowledge about complementary occurring items were neglected
in the literature. Nevertheless, recently a growing number of works is shedding
light on their importance for the sake of providing a richer knowledge for users.
For this purpose, we propose in this paper a new tool, called GARM, aiming at
building a partially ordered structure amongst some particular disjunctive pat-
terns, namely the disjunctive closed ones. Starting from this structure, deriv-
ing generalized association rules, i.e., those offering conjunctive, disjunctive and
negative connectors between items, becomes straightforward. Our experimental
study put the focus on the mining performances as well as the quantitative aspect
and proved the utility of the proposed approach.
Keywords: Data mining, disjunctive closed pattern, frequent essential pattern,
disjunctive support, equivalence class, partially ordered structure, generalized as-
sociation rules.
1 Introduction and Motivations
Association rule mining is a fundamental topic in Data mining [1]. It has been exten-
sively investigated since its inception. Its key idea consists in looking for causal rela-
tionships between sets of items, commonly called itemsets, where the presence of some
items suggests that others follow from them. A typical example of a successful appli-
cation of association rules is the market basket analysis, where the discovered rules can
lead to important marketing and management strategic decisions. Recently, mining as-
sociation rules was extended to various pattern classes like sequential patterns, graphs,
etc. Nevertheless, the main moan that can be addressed to the contributions related to
association rules is their focus on co-occurrences between items [2], probably as a her-
itage of the market basket analysis framework. Indeed, almost all related works neglect
the other kinds of relations, like mutually exclusive occurrences [3], that can also bring
information of worth interest for users.
In this paper, we propose a new tool, called GARM 1 , covering the whole process
allowing the extraction of generalized association rules. These latter generalize classical
rules – positive rules – to offer disjunctive and negative connectors between items,
1
GARM is the acronym of generalized association rule miner.
c Radim Belohlavek, Sergei O. Kuznetsov (Eds.): CLA 2008, pp. 145–156,
ISBN 978–80–244–2111–7, Palacký University, Olomouc, 2008.
146 Tarek Hamrouni, Sadok Ben Yahia, Engelbert Mephu Nguifo
in addition to the conjunctive one [4]. Our tool includes a first component making it
possible extracting a concise representation of frequent patterns based on disjunctive
patterns. Thanks to a second component, these latter will be partially structured w.r.t. set
inclusion. Once the partially ordered structure obtained, generalized association rules
can be easily derived thanks to the last component of our tool.
Noteworthily, extracting an exact concise representation of frequent patterns in the
first component of the process makes it possible to exactly derive the different supports
of each frequent pattern. This will make us able to compute the exact values of qual-
ity measures. Indeed, it was shown in [5] that almost all interestingness measures for
association rules are expressed depending on the support of the rule and those of its
associated premise and conclusion. In addition, using disjunctive patterns – in particu-
lar closed and essential patterns [6] – will provide an interesting starting point towards
mining association rules conveying complementary occurrences between items, rather
than co-occurrences. Indeed, these latter relationships – co-occurrences within literals 2
– were explored in-depth in the literature through association rules having conjunction
of literals, called literalsets, in premise and conclusion. This leads to what is commonly
known as positive and negative association rules. While disjunctive association rules
only have recently begin to grasp the interest of researchers.
In general, generalized association rules are useful in many applications. In partic-
ular, disjunctive association rules – having disjunction of items either in premise or in
conclusion – were considered for two main purposes: On the one hand, they were used
as an intermediate step for defining some concise representations for frequent patterns
[1]. On the other hand, they were exploited to provide users with new forms of asso-
ciation rules [7, 8]. For example, the added-value of such association rules has been
recently highlighted in [2]. It is however important to note that generalized association
rules can be considered as particular GUHA rules [9].
Note that we restrict ourselves in this work to disjunctive closed patterns whose
smallest seeds, i.e. essential patterns, are frequent with respect to a minimum conjunc-
tive support threshold. This is argued by the fact that we aim at retaining the spirit of
association rule mining where this threshold, as well as the confidence-based one, is
used to dramatically limit the number of extracted association rules. In addition, the use
of a partially ordered structure will make it possible to select representative subsets of
rules to be extracted. This nucleus of rules will be of paramount help for avoiding to
overwhelm users by highly-sized rule lists.
The remainder of the paper is organized as follows. The next section discusses the
related work. Section 3 recalls the key notions used throughout this paper. The struc-
tural properties of the disjunctive search space are explored in Section 4, followed by a
detailed description of the GARM tool having for purpose to offer a complete process
for the extraction of generalized association rules in Section 5. Experimental results fo-
cusing on the mining time as well as the quantitative aspect are reported and discussed
in Section 6. Section 7 concludes the paper and points out future works.
2
A literal is an item or the negation of an item.
GARM: Generalized Association Rule Mining 147
2 Related Work
Contributions related to association rule mining mainly concentrated on the classical
rule form, namely that presenting conjunction of items in both premise and conclusion
parts. In this respect, many concise representations for such rules were proposed in the
literature [10]. Recently, some works focused on introducing negative items. Never-
theless, the majority of items are not present in each transaction leading to explosive
amounts of association rules with negation. Thus, existing approaches have tried to
address this problem through the use of additional background information about the
data, incorporating attribute correlations, and additional rule interestingness measures,
etc. Here we will mainly detail the reduced number of related works on association
rules relying on the disjunctive connector within items.
Some works [7, 8] were interested in using the disjunction connector within the
association rule mining issue to define what is called generalized association rules.
These rules grasped the interest of many researchers since they offer wealthier types of
knowledge in many applications. In addition to the inclusive disjunction operator, i.e.,
the operator ∨, Nanavati et al. in [8] were also interested in the exclusive disjunction
operator, denoted ⊕. The authors hence proposed two kinds of rules which are the
simple disjunctive rules and the generalized disjunctive ones. Simple disjunctive rules
are those having either the premise or the conclusion (i.e., not simultaneously both)
composed by a disjunction of items. This disjunction can be inclusive (the simultaneous
occurrence of items is possible) or exclusive (two distinct items cannot occur together).
On the other hand, generalized disjunctive rules are disjunctive rules whose premises
or conclusions contain a conjunction of disjunctions. These disjunctions can either be
inclusive or exclusive. In [7], the author mainly focuses on getting out association rules
having conclusions containing mutually exclusive items, i.e., the presence of one of
them leads to the absence of the others, what is expressed in [8] using the operator ⊕.
Other forms of generalized association rules were also described in [11]. In [12], Shima
et al. extract what they called disjunctive closed rules. In their work, a disjunctive closed
rule simply stands for a clause under the disjunctive normal form (DNF) such that its
disjuncts are constituted by frequent closed patterns. Elble et al. used disjunctive rules
to handle numerical attributes by considering disjunctions between intervals [13]. This
latter work extends other ones taking also into account categorical attributes (see [13]
for references). Finally, it is worth noting that the disjunction connector has also been
used to define some concise representations of frequent patterns through the so-called
disjunctive rule (see for example [1] for references).
3 Basic Concepts
In this section, we briefly sketch the key notions that will be of use throughout the paper.
Definition 1. An extraction context is a triplet K = (O, I, R) where O and I are,
respectively, a finite set of objects (or transactions) and items (or attributes), and R ⊆
O × I is a binary relation between the objects and items. A couple (o, i) ∈ R denotes
that the object o ∈ O contains the item i ∈ I.
148 Tarek Hamrouni, Sadok Ben Yahia, Engelbert Mephu Nguifo
Example 1. We will consider in the remainder a context that consists of transactions
(1, AB ), (2, ACD ), (3, CDE ), (4, DEF ), (5, ABCDE ), and (6, ABC ) 3 .
Definition 2. (S UPPORTS OF A PATTERN ) Let K = (O, I, R) be a context and I be a
pattern. We mainly distinguish three kinds of supports related to I:
Supp( ∧ I ) = | {o ∈ O | (∀ i ∈ I, (o, i) ∈ R)} |
Supp( ∨ I ) = | {o ∈ O | (∃ i ∈ I, (o, i) ∈ R)} |
Supp(I ) = | {o ∈ O | (∀ i ∈ I, (o, i) ∈
/ R)} |
Roughly speaking, the semantics of the aforementioned supports is as follows:
• Supp(∧ I ) is the number of objects containing all items of I.
• Supp(∨ I ) is the number of objects containing at least one item of I.
• Supp(I ) is the number of objects that do not contain any item of I.
Note also that Supp(∨ I ) and Supp(I ) are two complementary quantities w.r.t. |O| in
the sense that: Supp(∨ I ) + Supp(I ) = |O|.
Example 2. Consider our running context. We have Supp(∧ CDE) = | {3, 5} | = 2,
Supp(∨ CDE) = | {2, 3, 4, 5, 6} | = 5 and Supp(CDE) = | {1} | = 1.
Hereafter, Supp(∧ I ) will simply be denoted Supp(I ). In addition, if there is no risk of
confusion, the conjunctive support will simply be called support. A pattern I is said to
be frequent if Supp(I ) is greater than or equal to a minimum support threshold, denoted
minsupp. Since the set of frequent patterns is an order ideal, the set of items I will be
considered as only containing frequent items. Lemma 1 states that conjunctive supports
can be derived starting from disjunctive ones.
Lemma 1. [14] Let I ⊆ I. The following equalities hold:
X 0
Supp(I ) = ( − 1)|I |−1 Supp( ∨ I 0 )
∅⊂I 0 ⊆I
4 Structural Properties of the Disjunctive Search Space
In this section, we will characterize disjunctive patterns through the associated equiva-
lence classes induced by the following closure operator:
Definition 3. Let K = (O, I, R) be an extraction context. The disjunctive closure op-
erator h is defined as follows [6]:
h : P (I ) → P (I )
I 7→ h(I ) = {i ∈ I | (∀ o ∈ O) ((o, i) ∈ R) ⇒ (∃ i1 ∈ I )((o, i1 ) ∈ R)}.
The disjunctive closure h(I ) of a pattern I is equal to the maximal set of items which
only appear in the transactions that contain at least an item of I. The closure operator h
induces an equivalence relation on the power-set of I, which partitions it into so-called
disjunctive equivalence classes. In each class, all the elements have the same disjunc-
tive support. The smallest incomparable elements, w.r.t. set inclusion, of a disjunctive
equivalence class are called essential patterns, while the disjunctive closed pattern is the
largest one [6]. These particular patterns are defined as follows.
3
We use a separator-free form for the sets, e.g., ABC stands for the set of items {A, B, C}.
GARM: Generalized Association Rule Mining 149
Definition 4.
• A pattern I ⊆ I is a disjunctive closed pattern if I = h(I ) or, equivalently, Supp( ∨ I )
< min{Supp(∨I 0 ) | I 0 ⊆ I s.t. I ⊂ I 0 }.
• A pattern I ⊆ I is an essential pattern if ∀ I 0 ⊂ I, I * h(I 0 ) or, equivalently, Supp(∨
I ) > max{Supp(∨I 0 ) | I 0 ⊆ I s.t. I 0 ⊂ I}.
Example 3. Consider our running context. The pattern CDEF is disjunctively closed,
while BE is not, since Supp(∨ BE ) = Supp(∨ BEF ). On the other hand, the pattern AC
is essential, while DE is not, since Supp(∨ DE ) = Supp(∨ D ).
In the remainder, FEP K 4 denotes the set of frequent essential patterns associated
to a given context K and a fixed minsupp value. The associated set of disjunctive closure
will further be denoted EDCP K 5 . This latter set is hence equal to {h(I ) | I ∈ FEP K }.
To establish the link with conjunctive equivalence class – gathering patterns having
the same Galois closure [15] – we notice that essential patterns (resp. disjunctive closed
patterns) are equivalent to minimal generators aka free-sets (resp. closed patterns) (see
[1] for references). These latter patterns were at the basis of the main concise repre-
sentations of association rules that were proposed in the literature [10]. This clearly
motivates the use of their correspondences within the disjunctive search space.
5 Detailed Description of the GARM Tool
As mentioned in the first section, the GARM tool is composed of three complemen-
tary components which are as follows: (i) Extracting an exact concise representation
of frequent patterns based on disjunctive closed patterns and frequent essential ones.
(ii) Building a partially ordered structure w.r.t. set inclusion within disjunctive closed
patterns. Each one of these latter will be accompanied by its set of frequent essential
patterns. (iii) Deriving generalized association rules from the built structure.
5.1 Extracting a New Concise Representation based on Disjunctive Patterns
Our representation is based on the sets FEP K and EDCP K , as stated by Theorem 1.
Theorem 1. The set EDCP K ∪ FEP K is an exact concise representation of the set of
frequent patterns FP K [16].
Example 4. Figure 1 (Left) lists the set of disjunctive closed patterns associated to
the running context. For each closed pattern, its associated disjunctive support and
frequent essential patterns, for minsupp = 1, are also given.
This representation will be denoted DSSRK 6 . It is extracted thanks to an adapta-
tion of our DCPR M INER 7 algorithm [17], what constitutes the first component of the
4
Stands for frequent essential patterns.
5
Stands for essential disjunctive closed patterns.
6
Stands for disjunctive search space-based representation.
7
DCPR M INER is the acronym of disjunctive closed pattern-based representation miner.
150 Tarek Hamrouni, Sadok Ben Yahia, Engelbert Mephu Nguifo
({AD, AE, BD, BCE}: ABCDEF, 6)
EDCP K Disj. Supp. F EP K
B 3 B ({AB, BC}: ABC, 5) ({BE}: BEF, 5) ({CD, CE}: CDEF, 5)
C 4 C
F 1 F
AB 4 A ({D}: DEF, 4)
EF 3 E ({A}: AB, 4)
ABC 5 AC, BC ({E}: EF, 3)
BEF 5 BE
DEF 4 D
CDEF 5 CD, CE ({B}: B, 3) ({C}: C, 4) ({F}: F, 1)
ABCDEF 6 AD, AE, BD, BCE
∅
Fig. 1. (Left) The set EDCP K and the associated disjunctive support and frequent essential pat-
terns for minsupp = 1. (Right) The equivalence classes partially ordered w.r.t. set inclusion.
GARM tool. Starting from DSSRK , the conjunctive and negative supports of frequent
patterns can thus be deduced using disjunctive supports. This representation also allows
the derivation of the support of each literalset whose positive variation is based on a
frequent pattern. This is carried outXusing the following formula [4]: Supp(x1 ∧ x2 ∧
. . . ∧ xn ∧ y1 ∧ y2 ∧ . . . ∧ ym ) = ( − 1)|S| Supp(x1 ∧ x2 ∧ . . . ∧ xn ∧ S), such
S⊆{y1 ,...,ym }
that its positive variation, namely {x1 , x2 , . . ., xn , y1 , y2 , . . ., ym }, belongs to FP K .
5.2 Building the Partially Ordered Structure
In this section, we will propose a new algorithm, called POSB 8 , for partially sorting
disjunctive closed patterns w.r.t. set inclusion. The POSB algorithm hence takes as
input the representation DSSRK s.t. to each disjunctive closed pattern is associated
its set of frequent essential patterns and disjunctive support. A node in the partially
ordered structure will be associated to each disjunctive closed pattern. The pseudo-
code of POSB is shown by Algorithm 1. Our algorithm inherits two main optimizations
used in the algorithm proposed by Valtchev et al. [18], namely the sorting of disjunctive
closed patterns, and the use of a border. Indeed, the set of disjunctive closed patterns
EDCP K is sorted w.r.t. the increasing pattern size. Since closures of equal size cannot be
comparable, this sorting avoids unnecessary comparisons. In addition, it makes possible
that the closure f under treatment be of the largest size w.r.t. already treated ones. Thus,
it suffices to find its lower cover among the nodes inserted in the structure. This lower
cover is composed by those closures which are immediately covered by f .
On the other hand, the border B is an anti-chain w.r.t. set inclusion containing max-
imal closures among those already treated. In fact, the Valtchev et al. algorithm con-
structs the Hasse diagram representing the subset-superset relationship among concepts
in the Galois lattice. It begins at the top of the lattice and then recursively identifies the
lower neighbors of each concept. Nevertheless, it is not directly adapted to our situa-
tion. Indeed, although the intersection of two disjunctive closed patterns is obviously
8
POSB is the acronym of partially ordered structure builder.
GARM: Generalized Association Rule Mining 151
Algorithm 1: POSB
Input: The set EDCP K of disjunctive closed patterns.
Output: The disjunctive closed patterns ordered by set inclusion.
Begin
B := ∅ ;
Foreach (f ∈ EDCP K ) do
P rohibited List = ∅;
Foreach (b ∈ B ) do
inter := b ∩ f ;
If (inter = b) then
L OWER C OVER I NSERTION (f , b);
B := B\ b;
Else If (inter 6= ∅) then
L OWER C OVER M ANAGEMENT (f , b);
B := B ∪ f ;
End
a disjunctive closed pattern, this latter does not necessarily belong to EDCP K . This is
due to the fact that it could have all its essential patterns infrequent and, hence, has been
already pruned. On its side, the proposed algorithm in [18] relies on the fact that the in-
tersection of two concepts was already treated and it suffices to locate the corresponding
node within the Hasse diagram.
In Algorithm 1, disjunctive closed patterns are inserted one at a time to a structure
which is only partially finished to obtain at the end the entire one. Let f be the current
disjunctive closed pattern to be inserted in the partially ordered structure. f will be com-
pared to the elements of the border B. If an element b ∈ B is included in f , then it is an
element of its lower cover. A link between the node representing b and that representing
f will be constructed thanks to the L OWER C OVER I NSERTION procedure (cf. Algo-
rithm 2). The element b will then be deleted from the border. If b is not included in f but
their intersection is not empty, then the L OWER C OVER M ANAGEMENT procedure will
identify the common immediate predecessors of b and f (cf. Algorithm 3). Finally, f
will be added to the border. It is important to note that in the L OWER C OVER M ANAGE -
MENT procedure, a prohibited list is associated to each disjunctive closed pattern to be
inserted in the partially ordered structure. Indeed, when updating the precedence link
between disjunctive closed patterns, a node can be visited more than once since it can
be an immediate predecessor of many other nodes. This list will avoid such useless
treatments by only allowing the visit of nodes that do not belong to it.
Example 5. The associated structure to our running context is given by Figure 1 (Right).
5.3 Deriving Generalized Association Rules
Once the partially ordered structure built, deriving (subsets) generalized association
rules can be easily done. An association rule R: X ⇒ Y based on a pattern Z, denoted
Z-based rule, is such that X = {x1 , x2 , . . . , xn } ⊆ I and Y = {y1 , y2 , . . . , ym } ⊆ I be
two patterns, X ∩ Y = ∅, and X ∪ Y = Z. An association rule is usually considered as
interesting w.r.t. two statistical measures, namely the support and the confidence. The
formulae of these measures for an arbitrary rule are as follows:
152 Tarek Hamrouni, Sadok Ben Yahia, Engelbert Mephu Nguifo
Algorithm 2: L OWER C OVER I NSERTION
Input: A disjunctive closure f , and an element pred to be inserted in its lower cover.
Output: The updated lower cover of f .
Begin
Foreach (l ∈ Lower Cover (f )) do
inter := l ∩ pred;
If (inter = pred) then
return;
Else If (inter = l ) then
Lower Cover (f ) := Lower Cover (f ) \ l;
Lower Cover (f ) := Lower Cover (f ) ∪ pred;
End
Algorithm 3: L OWER C OVER M ANAGEMENT
Input: A disjunctive closed pattern f , and an element b of the border B.
Output: The updated lower cover of f .
Begin
Foreach (pred b ∈ Lower Cover (b)) do
If (pred b ∈/ P rohibited List) then
inter := pred b ∩ f ;
If (inter = pred b) then
L OWER C OVER I NSERTION (f , pred b);
Else If (inter 6= ∅) then
L OWER C OVER M ANAGEMENT (f , pred b);
P rohibited List := P rohibited List ∪ pred b;
End
Supp(X ∧ Y )
Supp(X ⇒ Y ) = Supp(X ∧ Y ), and, Conf(X ⇒ Y ) = Supp(X )
A rule is said to be exact if its confidence is equal to 1. Otherwise, it is said to be
approximate. In addition, it is said to be interesting or valid if its support and confidence
values are greater than or equal to their respective minimum thresholds minsupp and
minconf. It is clear that whenever we are able to evaluate Supp(X ⇒ Y ), the derivation
of the confidence value will be straightforward.
Let us now adapt the association rule framework to our context. As shown in Sub-
section 5.1, the DSSRK representation allows deriving the disjunctive, conjunctive and
negative supports of each set of positive and negative items whose positive variation is
based on a frequent pattern. In the sequel, we present an overview of the process by
which we retrieve generalized association rules and evaluate their associated supports
through traversing the partially ordered structure. Rules can be classified according to
the number of nodes required for their extraction. We then distinguish two cases:
1. An intra-node rule: it requires a unique node and highlight relationships between
a frequent essential pattern and its disjunctive closure f (here Z = f ).
2. An inter-nodes rule: it is extracted using two nodes N1 and N2 s.t. the associated
disjunctive closure of N1 , denoted f1 , is one of the immediate predecessors of that
of N2 , denoted f2 . Let e1 be a frequent essential pattern of f1 . An inter-nodes rule
describes relationships between either f1 and f2 or e1 and f2 (here Z = f2 ).
GARM: Generalized Association Rule Mining 153
Both kinds of rules – intra-node and inter-nodes – can be either exact or approximate.
Different forms of generalized association rules can be extracted starting from our
representation (cf. [16] for a detailed description). To limit the number of possible ex-
tracted rule forms, we mainly focus here on the following ones:
1. Form 1: disjunction of items in premise and conclusion ∨ X ⇒ ∨ Y : Supp(∨ X
⇒ ∨ Y ) = Supp(∨ X ∧ ∨ Y ) = Supp(∨ X ) + Supp(∨ Y ) - Supp((∨ X ) ∨ (∨ Y ))
= Supp(∨ X ) + Supp(∨ Y ) - Supp(∨ Z),
2. Form 2: negation of items in premise and conclusion X ⇒ Y : Supp(X ⇒ Y ) =
Supp(X ∧ Y ) = Supp((( ∨ X ) ∨ ( ∨ Y ))) = Supp(Z) = |O| - Supp(∨ Z),
3. Form 3: disjunction of items in premise and negation of items in conclusion ∨ X
⇒ Y : Supp(∨ X ⇒ Y ) = Supp(∨ X ∧ Y ) = Supp((∨ X ) ∨ (∨ Y )) - Supp(∨ Y ) =
Supp(∨ Z) - Supp(∨ Y ), and,
4. Form 4: negation of items in premise and disjunction of items in conclusion X ⇒
∨ Y : Supp(X ⇒ ∨ Y ) = Supp(X ∧ ∨ Y ) = Supp((∨ X ) ∨ (∨ Y )) - Supp(∨ X ) =
Supp(∨ Z) - Supp(∨ X ),
where either X or Y is a frequent essential pattern or a disjunctive closed one, and Z =
X ∪ Y is a disjunctive closed pattern (as described above). For each rule, the support
of Z is known. It is the same for either X or Y since one of them is assumed to be a
frequent essential pattern or a disjunctive closed pattern. For the sake of simplicity, we
assume in the remainder that X is a frequent essential pattern or a disjunctive closed
pattern. Since Y = Z\X, then Y does not necessarily belong to DSSRK and, may even
not be a frequent pattern. Nevertheless, its disjunctive support is required to evaluate
that of the associated rule. To this end, we bound the support of Y using a lower bound,
denoted lb Supp, and an upper bound, denoted ub Supp, computed as follows:
• lb Supp(∨ Y ) = max{Supp(∨ e) | e ∈ FEP K and e ⊆ Y },
• ub Supp(∨ Y ) = min{Supp(∨ f ) | f ∈ EDCP K and Y ⊆ f }.
In this respect, if Y is encompassed between a frequent essential pattern and its
disjunctive closure, then lb Supp(∨ Y ) = ub Supp(∨ Y ). Hence, the support and confi-
dence of the associated rule will be exactly computed. Otherwise, these latter measures
will be bounded by a minimal and a maximal possible value using the bounds associated
to Y . Such rules, further denoted approximated rules, are defined as follows:
Definition 5. An association rule is said to be approximated if it has either its support
or its confidence not exactly determined.
Then, only valid rules having minimum possible values of support and confidence
greater than or equal to minsupp and minconf, respectively, will be retained. Note that
an approximated rule is different from an approximate rule in the sense that the latter
has its support and confidence exactly computed (with a confidence not equal to 1),
what is not the case of the former. In this respect, approximated rules were shown to
convey interesting knowledge in the case of positive rules (see for example [19]).
Noteworthily, the bounds lb Supp(∨ Y ) and ub Supp(∨ Y ) always exist. Indeed, on
the one hand, since the set of items I is pruned w.r.t. minsupp, then Y will be composed
of frequent items even if it is infrequent. These items obviously belong to FEP K , what
ensures the existence of the lower bound. On the other hand, Y is covered by at least a
disjunctive closed pattern, namely Z, what ensures the existence of the upper bound.
154 Tarek Hamrouni, Sadok Ben Yahia, Engelbert Mephu Nguifo
Example 6. Let minsupp = 1 and let minconf = 0.7. Consider the intra-node rule R1
of Form 1 based on the disjunctive closed pattern ABCDEF and its frequent essential
pattern BCE: ∨ BCE ⇒ ∨ ADF. Supp(R1 ) = Supp(∨ BCE) + Supp(∨ ADF) - Supp(∨
ABCDEF) = Supp(∨ ADF) (since h(BCE ) = ABCDEF ). Since ADF ∈ / DSSRK , we
need to evaluate its support. Since AD ⊆ ADF ⊆ h(AD ) = ABCDEF (cf. Figure 1
(Left)), then lb Supp(∨ ADF) = ub Supp(∨ ADF) = 6. Hence, Supp(R1 ) = 6 and
Conf (R1 ) = 1. R1 is hence a valid rule. Now, consider the inter-nodes rule R2 of Form
1 based on ABCDEF and one of its immediate predecessors, namely ABC (cf. Figure 1
(Right)): ∨ ABC ⇒ ∨ DEF. In this case, DEF ∈ EDCP K . Hence, Supp(R2 ) = Supp(∨
ABC) + Supp(∨ DEF) - Supp(∨ ABCDEF) = 5 + 4 - 6 = 3, and Conf (R2 ) = 0.6.
Here, we took X = ABC. If we set Y = ABC, then the associated rule R3 = ∨ DEF
⇒ ∨ ABC will have the same support than R2 . Nevertheless, its confidence is equal to
0.75. Hence, R3 is a valid rule while R2 is not.
6 Experimental Results
Our experiments 9 focused on the mining time as well as the number of extracted valid
rules w.r.t. their associated type, i.e., exact, approximate or approximated. They were
carried out on a PC equipped with a Pentium (R) having 3GHz as clock frequency and
1.75GB of main memory, running the GNU/Linux distribution Fedora Core 7 (with
2GB of swap memory). The compiler gcc 4.1.2 is used to generate the executable code
starting from our C++ implementation.
Table 1. Mining time of generalized association rules on benchmark contexts.
Context minsupp (%) Component 1 Component 2 Component 3 Total time
C ONNECT 80.00 2.1530 0.0068 0.0380 2.1978
60.00 2.2807 0.0402 0.1618 2.4827
40.00 2.5571 1.0443 0.9813 4.5827
P UMSB 90.00 3.1875 0.0403 0.1015 3.3293
80.00 3.1581 2.9364 1.9693 8.0638
70.00 3.6630 19.5460 8.7276 31.9366
KOSARAK 0.90 12.4551 0.1645 0.2239 12.8435
0.70 16.2936 0.6825 0.3794 17.3555
0.50 26.4491 5.6164 0.8738 32.9393
R ETAIL 2.00 0.8471 0.0039 0.0135 0.8645
1.00 1.0803 0.0113 0.0334 1.1250
0.50 2.3909 0.1127 0.1331 2.6367
In the proposed experiments, the minconf value is set to the relative minimum sup-
minsupp
port value, i.e., |O| . Table 1 presents the mining time in seconds of the three
components of GARM. This table shows the efficiency of our tool towards extract-
ing generalized associated rules. Indeed, even for low minsupp values, GARM remains
very fast. In this respect, the time consumed by each component, w.r.t. the total time,
9
Test contexts are available at: http://fimi.cs.helsinki.fi/data.
GARM: Generalized Association Rule Mining 155
Table 2. Number of extracted generalized association rules on benchmark contexts.
Context minsupp (%) Exact Approximate Approximated Total number
C ONNECT 80.00 620 316 152 1, 088
60.00 1, 533 1, 337 354 3, 224
40.00 3, 319 5, 813 3, 130 12, 262
P UMSB 90.00 566 1, 322 730 2, 618
80.00 4, 376 13, 426 5, 002 22, 804
70.00 9, 409 26, 747 14, 870 51, 026
KOSARAK 0.90 0 7, 586 0 7, 586
0.70 0 13, 046 0 13, 046
0.50 0 29, 648 0 29, 648
R ETAIL 2.00 0 464 0 464
1.00 0 1, 160 0 1, 160
0.50 0 4, 622 0 4, 622
closely depends on the context characteristics. Nevertheless, the second and third com-
ponents are in general faster than the first one. On the other hand, Table 2 highlights that
the number of extracted rules closely depends on the context density. Indeed, the higher
the value of this latter, the larger the associated equivalence classes are, and the greater
the number of frequent essential patterns and closed ones is. This fact augments the
number of rules even for high minsupp values for dense contexts. Interestingly enough,
the number of exact and approximated rules for R ETAIL and KOSARAK is equal to 0
for the tested minsupp values. This is due to the fact that for both contexts, each essen-
tial pattern is equal to its disjunctive closure what is not the case for the C ONNECT and
P UMSB contexts. Please note that the mining time and the number of extracted rules
when minconf varies is omitted here, due to space limitations.
7 Conclusion and Perspectives
In this paper, we presented a complete tool, called GARM, allowing the extraction
of generalized association rules. Our tool is composed of three components. The first
consists in extracting a concise representation of frequent patterns based on disjunctive
closed ones. The second component aimed at partially ordering these closure w.r.t. set
inclusion. Once the structure built, extracting subsets of generalized association rules
becomes a straightforward task thanks to the last component. Carried out experiments
proved the effectiveness of the proposed tool. It is also important to mention that our
GARM tool is easily adaptable to the case where the input is composed by conjunctive
(closed) patterns instead of disjunctive ones.
Other avenues for future work mainly address the following points: First, a detailed
comparison of our approach to the general GUHA approach [9] will be carried out.
Second, the relationships between the various rule forms will be studied. The purpose
is to only retain a lossless subset of rules while being able to derive the remaining re-
dundant ones. Adequate axiomatic systems need thus to be set up.
Acknowledgments: We would like to thank anonymous reviewers for their helpful
comments and suggestions. We are also grateful to Mrs. Nassima Ben Younes for fruit-
ful discussions and help in the implementation of the tool. This work is supported by
the French-Tunisian project CMCU-Utique 05G1412.
156 Tarek Hamrouni, Sadok Ben Yahia, Engelbert Mephu Nguifo
References
1. Ceglar, A., Roddick, J.F.: Association mining. ACM Computing Surveys, volume 38(2)
(2006)
2. Steinbach, M., Kumar, V.: Generalizing the notion of confidence. Knowledge and Informa-
tion Systems, volume 12(3) (2007) 279–299
3. Tzanis, G., Berberidis, C.: Mining for mutually exclusive items in transaction databases.
International Journal of Data Warehousing and Mining, volume 3(3) (2007) 45–59
4. Toivonen, H.: Discovering of frequent patterns in large data collections. PhD thesis, Univer-
sity of Helsinki, Helsinki, Finland (1996)
5. Hébert, C., Crémilleux, B.: A unified view of objective interestingness measures. In: Pro-
ceedings of the 5th International Conference Machine Learning and Data Mining in Pattern
Recognition, Springer-Verlag, LNCS, volume 4571. (2007) 533–547
6. Hamrouni, T., Denden, I., Ben Yahia, S., Mephu Nguifo, E.: A new concise representation of
frequent patterns through disjunctive search space. In: Proceedings of the 5th International
Conference on Concept Lattices and their Applications. (2007) 50–61
7. Kim, H.D.: Complementary occurrence and disjunctive rules for market basket analysis in
data mining. In: Proceedings of the 2nd IASTED International Conference Information and
Knowledge Sharing. (2003) 155–157
8. Nanavati, A.A., Chitrapura, K.P., Joshi, S., Krishnapuram, R.: Mining generalised disjunc-
tive association rules. In: Proceedings of the 10th International Conference on Information
and Knowledge Management. (2001) 482–489
9. Hájek, P., Havránek, T.: Mechanizing Hypothesis Formation: Mathematical Foundations for
a General Theory. Springer-Verlag (1978)
10. Kryszkiewicz, M.: Concise representations of association rules. In: Proceedings of the ESF
Exploratory Workshop on Pattern Detection and Discovery in Data Mining, Springer-Verlag,
LNCS, volume 2447. (2002) 92–109
11. Grün, G.A.: New forms of association rules. Technical Report TR 1998-15, School of
Computing Science, Simon Fraser University, Burnaby, BC, Canada (1998)
12. Shima, Y., Hirata, K., Harao, M., Yokoyama, S., Matsuoka, K., Izumi, T.: Extracting dis-
junctive closed rules from MRSA data. In: Proceedings of the 1st International Conference
on Complex Medical Engineering. (2005) 321–325
13. Elble, J., Heeren, C., Pitt, L.: Optimized disjunctive association rules via sampling. In:
Proceedings of the 3rd IEEE International Conference on Data Mining. (2003) 43–50
14. Galambos, J., Simonelli, I.: Bonferroni-type inequalities with applications. Springer (2000)
15. Ganter, B., Wille, R.: Formal Concept Analysis. Springer (1999)
16. Hamrouni, T., Denden, I., Ben Yahia, S., Mephu Nguifo, E.: Exploring the disjunctive search
space towards discovering new exact concise representations for frequent patterns. Technical
report, CRIL-CNRS of Lens, Lens, France (2007)
17. Denden, I., Hamrouni, T., Ben Yahia, S.: Efficient exploration of the disjunctive lattice
towards extracting concise representations of frequent patterns. To appear in the Proceedings
of the 9th African Conference on Research in Computer Science and Applied Mathematics
(in French). (2008)
18. Valtchev, P., Missaoui, R., Lebrun, P.: A fast algorithm for building the Hasse diagram of a
Galois lattice. In: Proceedings of the Conference on Combinatorics, Computer Science and
Applications. (2000) 293–306
19. Boulicaut, J.F., Bykowski, A., Rigotti, C.: Free-sets: A condensed representation of Boolean
data for the approximation of frequency queries. Data Mining and Knowledge Discovery
volume 7(1) (2003) 5–22