=Paper=
{{Paper
|id=None
|storemode=property
|title= Mining High Quality Association Rules Using Genetic Algorithms
|pdfUrl=https://ceur-ws.org/Vol-710/paper35.pdf
|volume=Vol-710
|dblpUrl=https://dblp.org/rec/conf/maics/Wakabi-WaiswaB11
}}
== Mining High Quality Association Rules Using Genetic Algorithms==
Mining High Quality Association Rules Using Genetic Algorithms
Peter P. Wakabi-Waiswa and Venansius Baryamureeba
Faculty of Computing & Information Technology,
Makerere University, Kampala, Uganda
Email: pwakabi@hotmail.com, barya@cit.mak.ac.ug
Abstract gini, entropy gain, Laplace, lift and conviction that can be
used to evaluate the rules (Carvalho, Freitas, & Ebecken
Association rule mining problem (ARM) is a struc- 2005), (Freitas 1999).
tured mechanism for unearthing hidden facts in large
data sets and drawing inferences on how a subset of
Quite a number of research works have been carried out
items influences the presence of another subset. ARM is in this arena but results indicate that more innovative ap-
computationally very expensive because the number of proaches need to be introduced with a view of finding algo-
rules grow exponentially as the number of items in the rithms that can handle multiple and increasing rule quality
database increase. This exponential growth is exacer- metrics as well as improving algorithm efficiency (Hruschka
bated further when data dimensions increase. The asso- et al. 1999), (Kotsiantis & Kanellopoulos 2006).
ciation rule mining problem is even made more com- In this paper we propose a Multi−Objective Genetic Algo-
plex when the need to take the different rule quality rithm for Mining Association Rules (MOGAMAR), which
metrics into account arises. In this paper, we propose generates association rules with f ive rule quality metrics:
a genetic algorithm (GA) to generate high quality as- confidence, support, interestingness, lift and J−Measure
sociation rules with five rule quality metrics. We study
the performance of the algorithm and the experimental
which permit the user to evaluate association rules on these
results show that the algorithm produces high quality different quality metrics in a single algorithm run. The main
rules in good computational times. motivation for using Genetic Algorithm (GA) is that a GA
Keywords: confidence; support; interestingness; lift; performs a global search and copes better with attribute in-
J−Measure; genetic algorithms; teraction than the greedy rule induction algorithms often
used in data mining tasks (Freitas 2007). Genetic Algo-
rithms are robust with little likelihood of getting stuck in
1 Introduction local optima and they are highly parallel in nature making
Since the association rule mining problem was proposed by them good candidates for distributed implementations (Liu
(Agrawal, Imielinski, & Swami 1993), several algorithms & Kwok 2000a).
have been developed. In the most commonly used approach, The rest of this paper is organized as follows. We present
the rule generation process is split into two separate steps. a detailed description of the proposed algorithm in Section
The first step includes applying the minimum support to find 2. In Section 3 we evaluate the performance of the algorithm
all frequent itemsets in a database and the second step in- and in Section 4 we conclude the paper.
cludes applying the minimum confidence constraint on the
frequent itemsets to form the rules. The first step is compu- 2 The Multi−Objective Genetic Algorithm
tationally very expensive because finding all frequent item- for Mining Association Rules
sets in a database involves searching all possible itemsets.
The set of all possible itemsets grows exponentially with the In this work we use the underlying structure of the object-
number of items in the database. The exponential growth oriented genetic algorithm proposed in (Davis 1991) with
profoundly affects the performance of association rule min- modifications to the representation of the individuals. In the
ing algorithms (Agrawal, Imielinski, & Swami 1993). rest of this section we give a detailed description of the pro-
Other than the aforementioned difficulties, most of the ex- posed algorithm.
isting algorithms have been developed to produce simple,
easy to understand association rules which measure the qual- 2.1 Representation of the Rules
ity of generated rules by considering mainly only one or two We adopted a modified Michigan approach proposed in
evaluation criteria most especially confidence factor or pre- (Ghosh & Nath 2004) whereby the encoding/decoding
dictive accuracy (Dehuri et al. 2008). These algorithms pro- scheme associates two bits to each attribute in the database.
vide useful tools for descriptive data mining but there are If these two bits are 00 then the attribute next to these two
several measures of rule quality such as comprehensibility, bits appears in the antecedent part and if it is 11 then the
confidence, J−measure, surprise, gain, chi-squared value, attribute appears in the consequent part. And the other two
00 A 11 B 00 C 01 D 00 F 2.3 Reproduction
Figure 1: Modified Michigan Rule Representation
The reproduction mechanism involves rule selection and the
application of the crossover operators. The rule selection
method used by this algorithm follows the “universal suf-
combinations, 01 and 10 will indicate the absence of the at- frage” approach proposed in (Giordana et al. 1997). With
tribute in either of these parts. As an example, suppose there this approach each association rule is represented by a sin-
is a rule ACF → BE. It will be represented as shown in Fig- gle individual. The individuals to be mated are elected by
ure 1. training data instances. Each instance votes for a rule that
Following this approach the algorithm can handle variable it covers in a stochastic fitness-based way. Using an exam-
length rules with more storage efficiency, adding only an ple, let us assume we have a set R of rules or chromosomes
overhead of 2k bits, where k is the number of attributes that cover a given instance i i.e. a rule whose antecedent and
in the database. The downside of this approach is that it consequent are satisfied by the instance i. Then the instance
is not well suited for handling continuous valued attributes. i votes in one of the rules in R by using a roulette wheel
For handling real-valued attributes we have incorporated the selection scheme.
discretization method proposed in (Kwedlo & Kretowski This means that each rule r in R is assigned a roulette-wheel
1999). The combination of these two approaches enabled us slot whose size is proportional to the ratio of fitness of r di-
to uniformly represent the association rules using the Michi- vided by the sum of fitness of all rules in R. The better the
gan approach. The decoding is performed as follows: fitness of a given rule r the higher its probability of being
selected over the other rules covering the given instance i.
(2i−1 ∗ ith biti )
P
DV = mnv + (mxv − mnv) ∗ (1) In the event of absence of a rule covering i the algorithm
2n − 1 automatically creates a new rule using the seeding operator
where DVis the decoded value; 1 ≤ i ≤ n and n is the used at the population initialization stage. Since it is only
number of bits used for encoding; mnv and mxv are mini- the rules that cover a given instance that do compete with
mum and maximum values of the attribute, respectively; and each other, this results into some kind of niching. Niching
biti is the value of the bit in position i. It is important to is a mechanism through which evolutionary algorithms form
note with that the encoding of the rules in this algorithm, the and maintain subpopulations or niches. Niching fosters the
consequent part of the rule is not important at the start of the evolution of several different rules each covering a different
run of the algorithm because the the consequent should not part of the data being mined. This assists avoid the con-
be randomly determined. The consequent part is determined vergence of the population to a single rule resulting in the
when the fitness of the rule is computed. discovery of a set of rules rather than a single one.
The actual reproduction takes place by performing a
2.2 Initialization multi-point crossover and the mutation on the new individu-
We avoid generation of the initial population purely ran- als.
domly because it may result in rules that will cover no train- The Crossover Operator We modified the standard
ing data instance thereby having very low fitness. Further- crossover operator to either generalize the crossover oper-
more, a population with rules that are guaranteed to cover ator if the rule is too specific, or to specialize it if the rule
at least one training instance can lead to over−fitting the is too general. A rule is considered too specific if it covers
data. It was shown in (Surry & Radcliffe 1996) that uti- too few data instances i.e. when too few data instances sat-
lizing non−random initialization can lead to an improve- isfy both the antecedent and the consequent of the rule. In
ment in the quality of the solution and can drastically reduce contrast, a rule is considered too general when it covers too
the the runtime. We, therefore, designed an initialization many data instances. We make use of the bitwise OR and
method which includes choosing a training instance to act as the bitwise AN D to implement generalization and special-
a “seed” for rule generation as proposed in (Freitas 2002). In ization, respectively. The bitwise OR and the bitwise AN D
our initialization approach, a seed should be a data instance are applied to the antecedent part of the rule since this deter-
lying in the middle of a cluster of examples with a common mines the consequent part (Giordana et al. 1997).
consequent. The training examples are stored in an array
and iteratively for each example we calculate the fraction The generalization/specialization crossover procedure
of those that cover the consequent against those that negate first constructs an index, I = {i1 , ..., in }, of pointers to the
the coverage of that consequent (same-consequent/opposite- positions in the chromosome (bit-string) where the corre-
consequent), ρ, as in (2). sponding bits in the two parents have different values. Then,
for every element ik ∈ I the following procedure is re-
Count(Sameconsequent ) peated. If the rule is too general, the algorithm replaces the
ρ= (2)
Count(Oppositeconsequent ) value of the bits b (ik ) with the logical OR of the corre-
where Count(Sameconsequent ) denotes the sponding bits in the parents, otherwise if the rule is too spe-
number of same-consequent examples and cific the algorithm replaces the value of the bit b (ik ) with
Count(Oppositeconsequent ) denotes opposite-consequent the logical AN D of the corresponding bits in the parents.
examples. The training example with the highest ρ will be The crossover operator is equipped with a mechanism for
selected as the seed. detecting and eliminating invalid genotypes.
The Mutation Operator The mutation operator helps in Fitness Evaluation MOGAMAR performs the fitness
maintaining the diversity within the population and also in evaluation of the generated rules using a set of five comple-
preventing premature convergence to local optima (Liu & mentary metrics: confidence, support, interestingness, lift
Kwok 2000b). The mutation operator needs to be designed and J−Measure. These metrics are converted into an objec-
in such a way that we avoid the population being domi- tive fitness function with user defined weights. The support,
nated by a single highly fit individual. Our approach to cope σ(X), of an itemset X is defined as the proportion of trans-
with this problem is to use an adaptive mutation probabil- actions in the data set which contain the itemset. The con-
ity, where the value of the mutation probability is varied fidence factor or predictive accuracy of the rule is the con-
as the population becomes dominated by an individual. We ditional probability of the consequent given the antecedent,
adopted the non−unif orm−mutation operator proposed calculated as in (3).
in (Michalewicz 1999). The non-uniform mutation opera-
tor adapts to the environment by varying as the fitness of the conf idence = σ (X ∪ Y ) /σ(X) (3)
individuals in the population changes. We made a modifi- We adopt the interestingness metric calculation proposed in
cation to the non-uniform mutation operator to enable it to (Freitas 1998). The algorithm first calculates the informa-
generalize and/or specialize a rule condition. The mutation tion gain, Inf oGain(Ai ), of each attribute, Ai . Then the
operator randomly selects a condition from the rule. If that interestingness of the rule antecedent, RAI, is calculated by
condition involves a nominal attribute, then the value will be an information−theoretical measure as (4).
randomly changed from one value to another. If the attribute
n
P
is continuous, mutation will randomly change the conditions Inf oGain(Ai )
interval values. The specialization mutation operator works i=1
on a randomly selected condition in the rule. If the condi-
n
RAI = 1 −
log (|Gk |)
(4)
tion involves a continuous attribute, specialization shrinks 2
the interval.
The mutation operator used here ensures that the high mu-
tation probabilities don’t cause the loss of the fittest individ- The degree of interestingness of the rule consequent (CAI)
ual of a generation. Furthermore, the mutation operator can is as (5):
cause the undesired effect of changing the rule’s consequent CAI = (1 − Pr (Gkl ))
1/β
(5)
if applied before generating the consequent but in our case
the consequent is generated after the mutation operator has where Gkl is the prior probability of the goal attribute value
been applied. Gkl , β is a user-specified parameter, and 1/β is a measure
for reducing the influence of the rule consequent interesting-
ness in the value of the fitness function.
2.4 Replacement The interestingness of the rule is given by (6):
We use an elitist individual replacement approach that en- RAI + CAI
sures that more fit genotypes are always introduced into the interestingness = (6)
population. 2
Uniqueness testing The application of the genetic opera- J−Measure shows how dissimilar a priori and posteriori
tors on the parent population may result in identical geno- beliefs are about a rule meaning that useful rules imply a
types in the population. The algorithm first tests to ensure high degree of dissimilarity. In rule inference we are inter-
the new offspring do not duplicate any existing member of ested in the distribution of the rule “implication” variable Y ,
the population. There is, however, a computational overhead and its two events y and complement ȳ. The purpose is to
caused by the uniqueness testing process on the operation of measure the difference between the priori distribution f (y),
the algorithm. The computational overhead is compensated i.e. f (Y = y) and f (Y 6= y), and the posteriori distribution
by a reduction in the number of genotype evaluations re- ~ The J − M easure is calculated as (7):
f (Y | X).
quired because the check for duplicates can be performed
before fitness evaluation. The saving of number of fitness f (y |x)
JM =f (x) f (y |x ).ln +
evaluations significantly increases efficiency. f (y)
(7)
Furthermore, the adoption of a replacement strategy with (1 − f (y |x))
genotype uniqueness enforced preserves genetic diversity (1 − f (y |x )) .ln
1 − f (y)
within the population (Lima et al. 2008). Genetic diver-
sity is significant as crossover-type operators depend on re- Lift is equivalent to the ratio of the observed support to
combining genetically dissimilar genotypes to make fitness that expected if X and Y were statistically independent and
gains. Uniqueness of the genotypes permits the algorithm to it is defined as (8):
find not only the single best individual but the n best geno-
types in a single run, where n is the population size. The σ(X ∪ Y )
lif t(X → Y ) = (8)
process of finding the best n individuals imposes some com- σ(X) ? σ(Y )
putational load but it is compensated with the generation of
Finally, the fitness function is calculated as the arithmetic
n high-fitness individuals in a single program run.
weighted average of confidence, support, interestingness,
lift and J−Measure. The fitness function, f (x), is given by Dataset No. of Instances Attributes
(9): Adult 48,842 14
ws ∗ S + wc ∗ C + wi ∗ I + wl ∗ L + wJ ∗ JM Chess 3,196 36
f (x) = Connect-4 67,557 42
ws + wc + wi + wl + wj Diabetes 768 20
(9) DNA 3190 62
where S denotes support, C denotes confidence, I Mushroom 8,124 22
denotes interestingness, L denotes lift, J denotes J- MVLS 441,327 32
Measure, and their respective user defined weights are
ws , wc , wi , wl , wj . The chromosomes are then ranked de- Table 1: Summary of the Datasets
pending on their fitness.
Selection Fitness Calculation Using the rank-based lin-
ear normalization we convert the fitness values into erage rule quality metrics and CPU utilization. From Table
selection-fitness values. With rank-based linear normaliza- 2, we observe that:
tion a linear increment is used to set the selection-fitness 1. The discovered rules have a very high average quality
values for the genotypes based on their rank. The main rea- value for all datasets implying that they are good algo-
son for using rank-based linear normalization is because it rithms
provides explicit control on the selection pressure applied 2. The algorithms are generally fast for smaller datasets with
to the population and reduces the likelihood that the popu- increasing average CPU times for larger datasets
lation will converge prematurely on a sub-optimal solution
(Metzler 2005). This assists in avoiding problems caused by 3. The algorithms generally produce poorer quality rules for
super-individuals dominating the selection process. large datasets.
Our overall observation is that the quality of the generated
2.5 Criteria for Termination rules decreases as the dataset dimension increase.
The algorithm terminates execution when the Degeneracy It can be observed that the algorithms consumed quite a
condition is met −i.e. when the best and worst performing bit of time with the Connect−4 and the MVLS datasets. It
chromosome in the population differs by less than 0.1%. It is quite evident that these two datasets have a much larger
also terminates execution when the total number of genera- number of transactions and attributes as compared to the rest
tions specified by the user is reached. of the datasets. We also observed that the rules with high
confidence within these datasets have very low support. For
3 Algorithm Performance instance, the rules that have a ‘tie’ as their consequent and a
confidence of 100% in the Connect−4 dataset, have a sup-
In this section, we evaluate the relative performance of port of only 14 records. This profoundly affects the time it
the algorithms and compare its performance to that of the takes to process the records. When we set a low minimum
Data Mining by Evolutionary Learning (DMEL) algorithm support the algorithm response time greatly improves.
(Au, Chan, & Yao 2003). DMEL is known to be one of
the best performing evolutionary algorithms for association Dataset MOGAMAR DMEL
rule mining(Reynolds & de la Iglesia 2008). The perfor- Average Time Average Time
mance characteristics studied included the quality of solu- Quality (Secs) Quality (Secs)
tions found and CPU utilization.
Adult 89.45 1.35 89.7 1.34
3.1 Datasets Connect-4 87.3 12 88.2 19
Chess 97.35 0.01 95.6 0.015
To evaluate our algorithm we used Adult, Connect-4, Chess,
Diabetes 87.75 0.015 79.8 0.017
Diabetes, DNA and Mushroom datasets from UCI repos-
DNA 96.0 1.23 95.4 1.21
itory of machine learning databases (Frank & Asuncion
Mushroom 89.4 0.03 88.5 0.04
2010). We also used the Motor Vehicle Licensing Sys-
tem (MVLS) database from the Uganda Revenue Author- MVLS 82.4 900 83.6 903
ity (URA) which we processed for use in experiments. The
Table 2: Quality of rules and run times
MVLS database contains data pertaining to motor vehicle li-
censing fees, vehicle ownership and registration, transfer of
ownership, accidents, usage, country of origin, date or year
of manufacture. The MVLS has over 4 million records but 3.3 Sensitivity to Dataset Size
we randomly selected 441,327 transactions for these exper- In Section 3.2, we used fixed datasets sizes to assess the per-
iments. The summary of the datasets is given in Table 1. formance of the algorithms. With a fixed dataset size it may
not be possible to gauge how well the algorithm performs
when the datasets grow. We now study the difference in
3.2 Relative Performance of the Algorithms performance of the algorithm with increasing dataset sizes.
The summary of the results of our experiments with the fixed Figure 2 summarizes the trends in the performance of algo-
parameters are given in Table 2. The results include the av- rithms with varying dataset sizes. From Figure 2 we observe
Figure 3: Relative Performance of MOGAMAR and DMEL
for different mutation rates
cause each structure in the new population undergoes a ran-
Figure 2: Relative Performance of the algorithms dom change with a probability equal to the mutation rate.
This prevents members of the population from converging
to a single very fit individual and as such increasing the re-
that the increase in dataset size leads to an increase in the av- quired number of valuations. This implies that the mutation
erage response time of the algorithms. Furthermore, when operator needs to be more thoroughly investigated to estab-
the data are increased two-fold, there is a multiplier effect lish the most suitable rates.
on the response time of the algorithms. This implies that the
performance of the ARM algorithm highly depends on the 4 Conclusion
size of the data being mined.
In this paper we have proposed a new association rule min-
ing algorithm, MOGAMAR. The approach proposed in this
3.4 Sensitivity to Genetic Operators and
paper incorporates a novel population initialization tech-
Parameters nique that ensures the production of high quality individ-
Experiments we carried out in Section 3.2 were done us- uals; specifically designed breeding operators that ensure
ing specific parameter instances. It is possible that varia- the elimination of defective genotypes; an adaptive muta-
tions in the parameter values can lead to either an improve- tion probability to ensure genetic diversity of the popula-
ment or deterioration in performance of the algorithms. In tion; and uniqueness testing. The performance of MOGA-
this subsection we make a deeper study on the effect of the MAR has been compared with DMEL, an efficient ARM
crossover and mutation operators on the performance of the algorithm, on a number of benchmark datasets with experi-
algorithms. mental results showing that our proposed approach can yield
equally as good performance with consistently high quality
Effect of Crossover Operator We studied the perfor-
rules. MOGAMAR provides the user with rules according
mance of MOGAMAR and DMEL with different values of
to five interestingness metrics, which can easily be increased
the crossover rate. We observed that the overall performance
if need be by modifying the fitness function. We studied
of both algorithms with different crossover rates is slightly
the effect of parameters to the performance of the algorithm
different from that shown when the values of the crossover
specifically dataset size, crossover and mutation rates. We
rate is fixed. This implies that the crossover rate does not
observed that the algorithm performs poorly for large dataset
have a very profound effect on the performance of the algo-
sizes. The poor performance is more evident as the datasets
rithm.
increase in size. This has been identified as an area requir-
Effect of Mutation Operator Figure 3 shows the perfor- ing more research efforts. It was further observed that the
mance of the algorithms with varying mutation rates. It can crossover rate does not have a big impact on the performance
be seen that their performance was drastically affected with while the mutation rate does. This indicates that it is neces-
increasing mutation rates implying that the mutation oper- sary to find methods for finding the right mutation rates that
ator has high impact on the performance of genetic algo- encourage the performance of the algorithm. This is another
rithms. With the mutation rates increasing, the utilization area we shall be researching in. We have used weighted fit-
of the CPU drastically increased showing a big reduction in ness function for finding the best rules but this approach may
the algorithm efficiency and disparity in the characteristics not be the best when there are several solution-quality crite-
of the algorithms became more evident. The main cause of ria to be evaluated. This is because these criteria may be
this phenomenon is that the mutation operator reintroduces non-commensurate or conflicting most especially when they
useful genotypes into the population for a diverse pool of evaluate different aspects of a candidate solution. In our fu-
parents. This increases the diversity of the population be- ture works we shall specifically look into the possibility of
differently modeling the ARM problem. Lima, C. F.; M.Pelikan; Goldberg, D.; O G. Lobo, K. S.;
and Hauschild, M. 2008. Influence of selection and re-
References placement strategies on linkage learning in boa. In Evo-
Agrawal, R.; Imielinski, T.; and Swami, A. 1993. Mining lutionary Computation, 2007. CEC 2007. IEEE Congress,
association rules between sets of items in large databases. 1083–1090.
In Proceedings of the 1993 ACM SIGMOD International Liu, J., and Kwok, J. 2000a. An extended genetic rule
Conference on Management of Data, Washington, D.C., induction algorithm. In Proceeding of the 2000 Congress
26–28. on Evolutionary Computation.
Au, W.; Chan, K.; and Yao, X. 2003. A novel evolution- Liu, J., and Kwok, J. 2000b. An extended genetic rule
ary data mining algorithm with applications to churn pre- induction algorithm. In Proceedings of the 2000 Congress
diction. IEEE Transactions on Evolutionary Computation on Evolutionary Computation, volume 1, 458–463.
7(6). Metzler, D. 2005. Direct maximization of rank-based
Carvalho, D. R.; Freitas, A. A.; and Ebecken, N. F. F. 2005. metrics. Technical report, University of Massachusetts,
Evaluating the correlation between objective rule interest- Amherst.
ingness measures and real human interest. In PKDD, 453– Michalewicz, Z. 1999. Genetic Algorithms + Data Struc-
461. tures = Evolution Programs. Springer-Verlag, New York.
Davis, L. 1991. Handbook of Genetic Algorithms. VNR Reynolds, A., and de la Iglesia, B. 2008. A multi-objective
Computer Library, Von Nostrad Reinhold, New York. grasp for partial classification. Soft Comput. 13.
Dehuri, S.; Patnaik, S.; Ghosh, A.; and Mall, R. 2008. Ap- Surry, P., and Radcliffe, N. 1996. Inoculation to initialize
plication of elitist multi-objective genetic algorithm in clas- evolutionary search. In T.C., F., ed., Evolutionary Comput-
sification rule generation. Applied Soft Computing Journal ing: AISB Workshop, Springer, Brighton, U.K.
8:477–487.
Frank, A., and Asuncion, A. 2010. UCI machine learning
repository.
Freitas, A. 1998. On objective measures of rule surprising-
ness. In Principles of Data Mining and Knowledge Discov-
ery, 2nd European Symp., PKDD98. Nantes,France, 1083–
1090.
Freitas, A. A. 1999. On rule interestingness measures.
Knowledge-Based Systems 12:309–3135.
Freitas, A. A. 2002. Data mining and knowledge discov-
ery with evolutionary algorithms. Springer-Verlag Berlin
Heidelberg.
Freitas, A. A. 2007. A review of evolutionary algorithms
for data mining. Soft Computing, Knowledge Discovery
and Data Mining.
Ghosh, A., and Nath, B. 2004. Multi−objective rule
mining using genetic algorithms. Information Sciences
163:123–133.
Giordana, A.; Anglano, C.; Giordana, A.; Bello, G. L.; and
Saitta, L. 1997. A network genetic algorithm for concept
learning. In Proceedings of the Sixth International Confer-
ence on Genetic Algorithms, 436–443. Morgan Kaufmann.
Hruschka, E. R.; Campello, R. J.; Freitas, A. A.; and de
Carvalho, A. C. P. L. F. 1999. Survey of evolutionary
algorithms for clustering. IEEE Transactions on Systems,
Man, and Cybernetics 12:309–315.
Kotsiantis, S., and Kanellopoulos, D. 2006. Associa-
tion rules mining: A recent overview. GESTS Interna-
tional Transactions on Computer Science and Engineering
32(1):71–82.
Kwedlo, W., and Kretowski, M. 1999. An evolutionary al-
gorithm using multivariate discretization for decision rule
induction. In Proceedings of the 3rd European Confer-
ence on Principles and Practice of Knowledge Databases
(PKDD ’99).