Mining High Quality Association Rules Using Genetic Algorithms

                              Peter P. Wakabi-Waiswa and Venansius Baryamureeba
                                         Faculty of Computing & Information Technology,
                                              Makerere University, Kampala, Uganda
                                        Email: pwakabi@hotmail.com, barya@cit.mak.ac.ug


                           Abstract                               gini, entropy gain, Laplace, lift and conviction that can be
                                                                  used to evaluate the rules (Carvalho, Freitas, & Ebecken
     Association rule mining problem (ARM) is a struc-            2005), (Freitas 1999).
     tured mechanism for unearthing hidden facts in large
     data sets and drawing inferences on how a subset of
                                                                     Quite a number of research works have been carried out
     items influences the presence of another subset. ARM is      in this arena but results indicate that more innovative ap-
     computationally very expensive because the number of         proaches need to be introduced with a view of finding algo-
     rules grow exponentially as the number of items in the       rithms that can handle multiple and increasing rule quality
     database increase. This exponential growth is exacer-        metrics as well as improving algorithm efficiency (Hruschka
     bated further when data dimensions increase. The asso-       et al. 1999), (Kotsiantis & Kanellopoulos 2006).
     ciation rule mining problem is even made more com-           In this paper we propose a Multi−Objective Genetic Algo-
     plex when the need to take the different rule quality        rithm for Mining Association Rules (MOGAMAR), which
     metrics into account arises. In this paper, we propose       generates association rules with f ive rule quality metrics:
     a genetic algorithm (GA) to generate high quality as-        confidence, support, interestingness, lift and J−Measure
     sociation rules with five rule quality metrics. We study
     the performance of the algorithm and the experimental
                                                                  which permit the user to evaluate association rules on these
     results show that the algorithm produces high quality        different quality metrics in a single algorithm run. The main
     rules in good computational times.                           motivation for using Genetic Algorithm (GA) is that a GA
     Keywords: confidence; support; interestingness; lift;        performs a global search and copes better with attribute in-
     J−Measure; genetic algorithms;                               teraction than the greedy rule induction algorithms often
                                                                  used in data mining tasks (Freitas 2007). Genetic Algo-
                                                                  rithms are robust with little likelihood of getting stuck in
                     1    Introduction                            local optima and they are highly parallel in nature making
Since the association rule mining problem was proposed by         them good candidates for distributed implementations (Liu
(Agrawal, Imielinski, & Swami 1993), several algorithms           & Kwok 2000a).
have been developed. In the most commonly used approach,             The rest of this paper is organized as follows. We present
the rule generation process is split into two separate steps.     a detailed description of the proposed algorithm in Section
The first step includes applying the minimum support to find      2. In Section 3 we evaluate the performance of the algorithm
all frequent itemsets in a database and the second step in-       and in Section 4 we conclude the paper.
cludes applying the minimum confidence constraint on the
frequent itemsets to form the rules. The first step is compu-      2    The Multi−Objective Genetic Algorithm
tationally very expensive because finding all frequent item-                 for Mining Association Rules
sets in a database involves searching all possible itemsets.
The set of all possible itemsets grows exponentially with the     In this work we use the underlying structure of the object-
number of items in the database. The exponential growth           oriented genetic algorithm proposed in (Davis 1991) with
profoundly affects the performance of association rule min-       modifications to the representation of the individuals. In the
ing algorithms (Agrawal, Imielinski, & Swami 1993).               rest of this section we give a detailed description of the pro-
   Other than the aforementioned difficulties, most of the ex-    posed algorithm.
isting algorithms have been developed to produce simple,
easy to understand association rules which measure the qual-      2.1   Representation of the Rules
ity of generated rules by considering mainly only one or two      We adopted a modified Michigan approach proposed in
evaluation criteria most especially confidence factor or pre-     (Ghosh & Nath 2004) whereby the encoding/decoding
dictive accuracy (Dehuri et al. 2008). These algorithms pro-      scheme associates two bits to each attribute in the database.
vide useful tools for descriptive data mining but there are       If these two bits are 00 then the attribute next to these two
several measures of rule quality such as comprehensibility,       bits appears in the antecedent part and if it is 11 then the
confidence, J−measure, surprise, gain, chi-squared value,         attribute appears in the consequent part. And the other two
      00      A    11    B    00    C    01    D    00    F        2.3   Reproduction
      Figure 1: Modified Michigan Rule Representation
                                                                   The reproduction mechanism involves rule selection and the
                                                                   application of the crossover operators. The rule selection
                                                                   method used by this algorithm follows the “universal suf-
combinations, 01 and 10 will indicate the absence of the at-       frage” approach proposed in (Giordana et al. 1997). With
tribute in either of these parts. As an example, suppose there     this approach each association rule is represented by a sin-
is a rule ACF → BE. It will be represented as shown in Fig-        gle individual. The individuals to be mated are elected by
ure 1.                                                             training data instances. Each instance votes for a rule that
Following this approach the algorithm can handle variable          it covers in a stochastic fitness-based way. Using an exam-
length rules with more storage efficiency, adding only an          ple, let us assume we have a set R of rules or chromosomes
overhead of 2k bits, where k is the number of attributes           that cover a given instance i i.e. a rule whose antecedent and
in the database. The downside of this approach is that it          consequent are satisfied by the instance i. Then the instance
is not well suited for handling continuous valued attributes.      i votes in one of the rules in R by using a roulette wheel
For handling real-valued attributes we have incorporated the       selection scheme.
discretization method proposed in (Kwedlo & Kretowski              This means that each rule r in R is assigned a roulette-wheel
1999). The combination of these two approaches enabled us          slot whose size is proportional to the ratio of fitness of r di-
to uniformly represent the association rules using the Michi-      vided by the sum of fitness of all rules in R. The better the
gan approach. The decoding is performed as follows:                fitness of a given rule r the higher its probability of being
                                                                   selected over the other rules covering the given instance i.
                                        (2i−1 ∗ ith biti )
                                   P                      
 DV = mnv + (mxv − mnv) ∗                                    (1)   In the event of absence of a rule covering i the algorithm
                                           2n − 1                  automatically creates a new rule using the seeding operator
   where DVis the decoded value; 1 ≤ i ≤ n and n is the            used at the population initialization stage. Since it is only
number of bits used for encoding; mnv and mxv are mini-            the rules that cover a given instance that do compete with
mum and maximum values of the attribute, respectively; and         each other, this results into some kind of niching. Niching
biti is the value of the bit in position i. It is important to     is a mechanism through which evolutionary algorithms form
note with that the encoding of the rules in this algorithm, the    and maintain subpopulations or niches. Niching fosters the
consequent part of the rule is not important at the start of the   evolution of several different rules each covering a different
run of the algorithm because the the consequent should not         part of the data being mined. This assists avoid the con-
be randomly determined. The consequent part is determined          vergence of the population to a single rule resulting in the
when the fitness of the rule is computed.                          discovery of a set of rules rather than a single one.
                                                                      The actual reproduction takes place by performing a
2.2    Initialization                                              multi-point crossover and the mutation on the new individu-
We avoid generation of the initial population purely ran-          als.
domly because it may result in rules that will cover no train-     The Crossover Operator We modified the standard
ing data instance thereby having very low fitness. Further-        crossover operator to either generalize the crossover oper-
more, a population with rules that are guaranteed to cover         ator if the rule is too specific, or to specialize it if the rule
at least one training instance can lead to over−fitting the        is too general. A rule is considered too specific if it covers
data. It was shown in (Surry & Radcliffe 1996) that uti-           too few data instances i.e. when too few data instances sat-
lizing non−random initialization can lead to an improve-           isfy both the antecedent and the consequent of the rule. In
ment in the quality of the solution and can drastically reduce     contrast, a rule is considered too general when it covers too
the the runtime. We, therefore, designed an initialization         many data instances. We make use of the bitwise OR and
method which includes choosing a training instance to act as       the bitwise AN D to implement generalization and special-
a “seed” for rule generation as proposed in (Freitas 2002). In     ization, respectively. The bitwise OR and the bitwise AN D
our initialization approach, a seed should be a data instance      are applied to the antecedent part of the rule since this deter-
lying in the middle of a cluster of examples with a common         mines the consequent part (Giordana et al. 1997).
consequent. The training examples are stored in an array
and iteratively for each example we calculate the fraction            The generalization/specialization crossover procedure
of those that cover the consequent against those that negate       first constructs an index, I = {i1 , ..., in }, of pointers to the
the coverage of that consequent (same-consequent/opposite-         positions in the chromosome (bit-string) where the corre-
consequent), ρ, as in (2).                                         sponding bits in the two parents have different values. Then,
                                                                   for every element ik ∈ I the following procedure is re-
                     Count(Sameconsequent )                        peated. If the rule is too general, the algorithm replaces the
              ρ=                                            (2)
                   Count(Oppositeconsequent )                      value of the bits b (ik ) with the logical OR of the corre-
where        Count(Sameconsequent )          denotes       the     sponding bits in the parents, otherwise if the rule is too spe-
number        of     same-consequent       examples        and     cific the algorithm replaces the value of the bit b (ik ) with
Count(Oppositeconsequent ) denotes opposite-consequent             the logical AN D of the corresponding bits in the parents.
examples. The training example with the highest ρ will be          The crossover operator is equipped with a mechanism for
selected as the seed.                                              detecting and eliminating invalid genotypes.
The Mutation Operator The mutation operator helps in                Fitness Evaluation MOGAMAR performs the fitness
maintaining the diversity within the population and also in         evaluation of the generated rules using a set of five comple-
preventing premature convergence to local optima (Liu &             mentary metrics: confidence, support, interestingness, lift
Kwok 2000b). The mutation operator needs to be designed             and J−Measure. These metrics are converted into an objec-
in such a way that we avoid the population being domi-              tive fitness function with user defined weights. The support,
nated by a single highly fit individual. Our approach to cope       σ(X), of an itemset X is defined as the proportion of trans-
with this problem is to use an adaptive mutation probabil-          actions in the data set which contain the itemset. The con-
ity, where the value of the mutation probability is varied          fidence factor or predictive accuracy of the rule is the con-
as the population becomes dominated by an individual. We            ditional probability of the consequent given the antecedent,
adopted the non−unif orm−mutation operator proposed                 calculated as in (3).
in (Michalewicz 1999). The non-uniform mutation opera-
tor adapts to the environment by varying as the fitness of the                   conf idence = σ (X ∪ Y ) /σ(X)                 (3)
individuals in the population changes. We made a modifi-            We adopt the interestingness metric calculation proposed in
cation to the non-uniform mutation operator to enable it to         (Freitas 1998). The algorithm first calculates the informa-
generalize and/or specialize a rule condition. The mutation         tion gain, Inf oGain(Ai ), of each attribute, Ai . Then the
operator randomly selects a condition from the rule. If that        interestingness of the rule antecedent, RAI, is calculated by
condition involves a nominal attribute, then the value will be      an information−theoretical measure as (4).
randomly changed from one value to another. If the attribute
                                                                                                   n
                                                                                                P                
is continuous, mutation will randomly change the conditions                                          Inf oGain(Ai )
interval values. The specialization mutation operator works                                  i=1                     
on a randomly selected condition in the rule. If the condi-
                                                                                                   n
                                                                                                                      
                                                                                  RAI = 1 − 
                                                                                             log (|Gk |)
                                                                                                                               (4)
tion involves a continuous attribute, specialization shrinks                                     2
                                                                                                                      
                                                                                                                      
the interval.
   The mutation operator used here ensures that the high mu-
tation probabilities don’t cause the loss of the fittest individ-   The degree of interestingness of the rule consequent (CAI)
ual of a generation. Furthermore, the mutation operator can         is as (5):
cause the undesired effect of changing the rule’s consequent                        CAI = (1 − Pr (Gkl ))
                                                                                                            1/β
                                                                                                                           (5)
if applied before generating the consequent but in our case
the consequent is generated after the mutation operator has         where Gkl is the prior probability of the goal attribute value
been applied.                                                       Gkl , β is a user-specified parameter, and 1/β is a measure
                                                                    for reducing the influence of the rule consequent interesting-
                                                                    ness in the value of the fitness function.
2.4   Replacement                                                      The interestingness of the rule is given by (6):
We use an elitist individual replacement approach that en-                                              RAI + CAI
sures that more fit genotypes are always introduced into the                     interestingness =                              (6)
population.                                                                                                 2
Uniqueness testing The application of the genetic opera-               J−Measure shows how dissimilar a priori and posteriori
tors on the parent population may result in identical geno-         beliefs are about a rule meaning that useful rules imply a
types in the population. The algorithm first tests to ensure        high degree of dissimilarity. In rule inference we are inter-
the new offspring do not duplicate any existing member of           ested in the distribution of the rule “implication” variable Y ,
the population. There is, however, a computational overhead         and its two events y and complement ȳ. The purpose is to
caused by the uniqueness testing process on the operation of        measure the difference between the priori distribution f (y),
the algorithm. The computational overhead is compensated            i.e. f (Y = y) and f (Y 6= y), and the posteriori distribution
by a reduction in the number of genotype evaluations re-                     ~ The J − M easure is calculated as (7):
                                                                    f (Y | X).
quired because the check for duplicates can be performed                                                     
before fitness evaluation. The saving of number of fitness                                            f (y |x)
                                                                         JM =f (x) f (y |x ).ln                  +
evaluations significantly increases efficiency.                                                         f (y)
                                                                                                                            (7)
   Furthermore, the adoption of a replacement strategy with                                                  (1 − f (y |x))
genotype uniqueness enforced preserves genetic diversity                              (1 − f (y |x )) .ln
                                                                                                               1 − f (y)
within the population (Lima et al. 2008). Genetic diver-
sity is significant as crossover-type operators depend on re-           Lift is equivalent to the ratio of the observed support to
combining genetically dissimilar genotypes to make fitness          that expected if X and Y were statistically independent and
gains. Uniqueness of the genotypes permits the algorithm to         it is defined as (8):
find not only the single best individual but the n best geno-
types in a single run, where n is the population size. The                                             σ(X ∪ Y )
                                                                                   lif t(X → Y ) =                              (8)
process of finding the best n individuals imposes some com-                                           σ(X) ? σ(Y )
putational load but it is compensated with the generation of
                                                                      Finally, the fitness function is calculated as the arithmetic
n high-fitness individuals in a single program run.
                                                                    weighted average of confidence, support, interestingness,
lift and J−Measure. The fitness function, f (x), is given by            Dataset        No. of Instances     Attributes
(9):                                                                    Adult                    48,842            14
          ws ∗ S + wc ∗ C + wi ∗ I + wl ∗ L + wJ ∗ JM                   Chess                     3,196            36
  f (x) =                                                               Connect-4                67,557            42
                    ws + wc + wi + wl + wj                              Diabetes                    768            20
                                                         (9)            DNA                        3190            62
  where S denotes support, C denotes confidence, I                      Mushroom                  8,124            22
denotes interestingness, L denotes lift, J denotes J-                   MVLS                    441,327            32
Measure, and their respective user defined weights are
ws , wc , wi , wl , wj . The chromosomes are then ranked de-                    Table 1: Summary of the Datasets
pending on their fitness.
Selection Fitness Calculation Using the rank-based lin-
ear normalization we convert the fitness values into             erage rule quality metrics and CPU utilization. From Table
selection-fitness values. With rank-based linear normaliza-      2, we observe that:
tion a linear increment is used to set the selection-fitness    1. The discovered rules have a very high average quality
values for the genotypes based on their rank. The main rea-         value for all datasets implying that they are good algo-
son for using rank-based linear normalization is because it         rithms
provides explicit control on the selection pressure applied     2. The algorithms are generally fast for smaller datasets with
to the population and reduces the likelihood that the popu-         increasing average CPU times for larger datasets
lation will converge prematurely on a sub-optimal solution
(Metzler 2005). This assists in avoiding problems caused by     3. The algorithms generally produce poorer quality rules for
super-individuals dominating the selection process.                 large datasets.
                                                                 Our overall observation is that the quality of the generated
2.5   Criteria for Termination                                   rules decreases as the dataset dimension increase.
The algorithm terminates execution when the Degeneracy              It can be observed that the algorithms consumed quite a
condition is met −i.e. when the best and worst performing        bit of time with the Connect−4 and the MVLS datasets. It
chromosome in the population differs by less than 0.1%. It       is quite evident that these two datasets have a much larger
also terminates execution when the total number of genera-       number of transactions and attributes as compared to the rest
tions specified by the user is reached.                          of the datasets. We also observed that the rules with high
                                                                 confidence within these datasets have very low support. For
            3    Algorithm Performance                           instance, the rules that have a ‘tie’ as their consequent and a
                                                                 confidence of 100% in the Connect−4 dataset, have a sup-
In this section, we evaluate the relative performance of         port of only 14 records. This profoundly affects the time it
the algorithms and compare its performance to that of the        takes to process the records. When we set a low minimum
Data Mining by Evolutionary Learning (DMEL) algorithm            support the algorithm response time greatly improves.
(Au, Chan, & Yao 2003). DMEL is known to be one of
the best performing evolutionary algorithms for association           Dataset         MOGAMAR                  DMEL
rule mining(Reynolds & de la Iglesia 2008). The perfor-                              Average Time         Average Time
mance characteristics studied included the quality of solu-                          Quality (Secs)       Quality (Secs)
tions found and CPU utilization.
                                                                      Adult          89.45   1.35         89.7    1.34
3.1   Datasets                                                        Connect-4      87.3    12           88.2    19
                                                                      Chess          97.35   0.01         95.6    0.015
To evaluate our algorithm we used Adult, Connect-4, Chess,
                                                                      Diabetes       87.75   0.015        79.8    0.017
Diabetes, DNA and Mushroom datasets from UCI repos-
                                                                      DNA            96.0    1.23         95.4    1.21
itory of machine learning databases (Frank & Asuncion
                                                                      Mushroom       89.4    0.03         88.5    0.04
2010). We also used the Motor Vehicle Licensing Sys-
tem (MVLS) database from the Uganda Revenue Author-                   MVLS           82.4    900          83.6    903
ity (URA) which we processed for use in experiments. The
                                                                           Table 2: Quality of rules and run times
MVLS database contains data pertaining to motor vehicle li-
censing fees, vehicle ownership and registration, transfer of
ownership, accidents, usage, country of origin, date or year
of manufacture. The MVLS has over 4 million records but         3.3    Sensitivity to Dataset Size
we randomly selected 441,327 transactions for these exper-      In Section 3.2, we used fixed datasets sizes to assess the per-
iments. The summary of the datasets is given in Table 1.        formance of the algorithms. With a fixed dataset size it may
                                                                not be possible to gauge how well the algorithm performs
                                                                when the datasets grow. We now study the difference in
3.2   Relative Performance of the Algorithms                    performance of the algorithm with increasing dataset sizes.
The summary of the results of our experiments with the fixed    Figure 2 summarizes the trends in the performance of algo-
parameters are given in Table 2. The results include the av-    rithms with varying dataset sizes. From Figure 2 we observe
                                                                    Figure 3: Relative Performance of MOGAMAR and DMEL
                                                                    for different mutation rates


                                                                    cause each structure in the new population undergoes a ran-
      Figure 2: Relative Performance of the algorithms              dom change with a probability equal to the mutation rate.
                                                                    This prevents members of the population from converging
                                                                    to a single very fit individual and as such increasing the re-
that the increase in dataset size leads to an increase in the av-   quired number of valuations. This implies that the mutation
erage response time of the algorithms. Furthermore, when            operator needs to be more thoroughly investigated to estab-
the data are increased two-fold, there is a multiplier effect       lish the most suitable rates.
on the response time of the algorithms. This implies that the
performance of the ARM algorithm highly depends on the                                   4   Conclusion
size of the data being mined.
                                                                    In this paper we have proposed a new association rule min-
                                                                    ing algorithm, MOGAMAR. The approach proposed in this
3.4   Sensitivity to Genetic Operators and
                                                                    paper incorporates a novel population initialization tech-
      Parameters                                                    nique that ensures the production of high quality individ-
Experiments we carried out in Section 3.2 were done us-             uals; specifically designed breeding operators that ensure
ing specific parameter instances. It is possible that varia-        the elimination of defective genotypes; an adaptive muta-
tions in the parameter values can lead to either an improve-        tion probability to ensure genetic diversity of the popula-
ment or deterioration in performance of the algorithms. In          tion; and uniqueness testing. The performance of MOGA-
this subsection we make a deeper study on the effect of the         MAR has been compared with DMEL, an efficient ARM
crossover and mutation operators on the performance of the          algorithm, on a number of benchmark datasets with experi-
algorithms.                                                         mental results showing that our proposed approach can yield
                                                                    equally as good performance with consistently high quality
Effect of Crossover Operator We studied the perfor-
                                                                    rules. MOGAMAR provides the user with rules according
mance of MOGAMAR and DMEL with different values of
                                                                    to five interestingness metrics, which can easily be increased
the crossover rate. We observed that the overall performance
                                                                    if need be by modifying the fitness function. We studied
of both algorithms with different crossover rates is slightly
                                                                    the effect of parameters to the performance of the algorithm
different from that shown when the values of the crossover
                                                                    specifically dataset size, crossover and mutation rates. We
rate is fixed. This implies that the crossover rate does not
                                                                    observed that the algorithm performs poorly for large dataset
have a very profound effect on the performance of the algo-
                                                                    sizes. The poor performance is more evident as the datasets
rithm.
                                                                    increase in size. This has been identified as an area requir-
Effect of Mutation Operator Figure 3 shows the perfor-              ing more research efforts. It was further observed that the
mance of the algorithms with varying mutation rates. It can         crossover rate does not have a big impact on the performance
be seen that their performance was drastically affected with        while the mutation rate does. This indicates that it is neces-
increasing mutation rates implying that the mutation oper-          sary to find methods for finding the right mutation rates that
ator has high impact on the performance of genetic algo-            encourage the performance of the algorithm. This is another
rithms. With the mutation rates increasing, the utilization         area we shall be researching in. We have used weighted fit-
of the CPU drastically increased showing a big reduction in         ness function for finding the best rules but this approach may
the algorithm efficiency and disparity in the characteristics       not be the best when there are several solution-quality crite-
of the algorithms became more evident. The main cause of            ria to be evaluated. This is because these criteria may be
this phenomenon is that the mutation operator reintroduces          non-commensurate or conflicting most especially when they
useful genotypes into the population for a diverse pool of          evaluate different aspects of a candidate solution. In our fu-
parents. This increases the diversity of the population be-         ture works we shall specifically look into the possibility of
differently modeling the ARM problem.                              Lima, C. F.; M.Pelikan; Goldberg, D.; O G. Lobo, K. S.;
                                                                   and Hauschild, M. 2008. Influence of selection and re-
                        References                                 placement strategies on linkage learning in boa. In Evo-
 Agrawal, R.; Imielinski, T.; and Swami, A. 1993. Mining           lutionary Computation, 2007. CEC 2007. IEEE Congress,
 association rules between sets of items in large databases.       1083–1090.
 In Proceedings of the 1993 ACM SIGMOD International               Liu, J., and Kwok, J. 2000a. An extended genetic rule
 Conference on Management of Data, Washington, D.C.,               induction algorithm. In Proceeding of the 2000 Congress
 26–28.                                                            on Evolutionary Computation.
 Au, W.; Chan, K.; and Yao, X. 2003. A novel evolution-            Liu, J., and Kwok, J. 2000b. An extended genetic rule
 ary data mining algorithm with applications to churn pre-         induction algorithm. In Proceedings of the 2000 Congress
 diction. IEEE Transactions on Evolutionary Computation            on Evolutionary Computation, volume 1, 458–463.
 7(6).                                                             Metzler, D. 2005. Direct maximization of rank-based
 Carvalho, D. R.; Freitas, A. A.; and Ebecken, N. F. F. 2005.      metrics. Technical report, University of Massachusetts,
 Evaluating the correlation between objective rule interest-       Amherst.
 ingness measures and real human interest. In PKDD, 453–           Michalewicz, Z. 1999. Genetic Algorithms + Data Struc-
 461.                                                              tures = Evolution Programs. Springer-Verlag, New York.
 Davis, L. 1991. Handbook of Genetic Algorithms. VNR               Reynolds, A., and de la Iglesia, B. 2008. A multi-objective
 Computer Library, Von Nostrad Reinhold, New York.                 grasp for partial classification. Soft Comput. 13.
 Dehuri, S.; Patnaik, S.; Ghosh, A.; and Mall, R. 2008. Ap-        Surry, P., and Radcliffe, N. 1996. Inoculation to initialize
 plication of elitist multi-objective genetic algorithm in clas-   evolutionary search. In T.C., F., ed., Evolutionary Comput-
 sification rule generation. Applied Soft Computing Journal        ing: AISB Workshop, Springer, Brighton, U.K.
 8:477–487.
 Frank, A., and Asuncion, A. 2010. UCI machine learning
 repository.
 Freitas, A. 1998. On objective measures of rule surprising-
 ness. In Principles of Data Mining and Knowledge Discov-
 ery, 2nd European Symp., PKDD98. Nantes,France, 1083–
 1090.
 Freitas, A. A. 1999. On rule interestingness measures.
 Knowledge-Based Systems 12:309–3135.
 Freitas, A. A. 2002. Data mining and knowledge discov-
 ery with evolutionary algorithms. Springer-Verlag Berlin
 Heidelberg.
 Freitas, A. A. 2007. A review of evolutionary algorithms
 for data mining. Soft Computing, Knowledge Discovery
 and Data Mining.
 Ghosh, A., and Nath, B. 2004. Multi−objective rule
 mining using genetic algorithms. Information Sciences
 163:123–133.
 Giordana, A.; Anglano, C.; Giordana, A.; Bello, G. L.; and
 Saitta, L. 1997. A network genetic algorithm for concept
 learning. In Proceedings of the Sixth International Confer-
 ence on Genetic Algorithms, 436–443. Morgan Kaufmann.
 Hruschka, E. R.; Campello, R. J.; Freitas, A. A.; and de
 Carvalho, A. C. P. L. F. 1999. Survey of evolutionary
 algorithms for clustering. IEEE Transactions on Systems,
 Man, and Cybernetics 12:309–315.
 Kotsiantis, S., and Kanellopoulos, D. 2006. Associa-
 tion rules mining: A recent overview. GESTS Interna-
 tional Transactions on Computer Science and Engineering
 32(1):71–82.
 Kwedlo, W., and Kretowski, M. 1999. An evolutionary al-
 gorithm using multivariate discretization for decision rule
 induction. In Proceedings of the 3rd European Confer-
 ence on Principles and Practice of Knowledge Databases
 (PKDD ’99).