=Paper= {{Paper |id=Vol-414/paper-5 |storemode=property |title=Measures of quality of rulesets extracted from data |pdfUrl=https://ceur-ws.org/Vol-414/paper5.pdf |volume=Vol-414 |dblpUrl=https://dblp.org/rec/conf/itat/Holena08 }} ==Measures of quality of rulesets extracted from data== https://ceur-ws.org/Vol-414/paper5.pdf
             Measures of Quality of Rulesets Extracted from Data

                                                      Martin Holeňa

                     Institute of Computer Science, Academy of Sciences of the Czech Republic,
        Pod vodárenskou věžı́ 2, 18207 Praha 8, Czech Republic, martin@cs.cas.cz, web:cs.cas.cz/~martin

Abstract. The paper deals with quality measures of whole         sification rules in real-world data mining tasks. The
sets of rules extracted from data, as a counterpart to more      paper discusses three possible ways of extending exist-
commonly used measures of individual rules. This research        ing ruleset quality measures from classification to gen-
has been motivated by increasingly frequent extraction of        eral rulesets. The proposed extensions are introduced
non-classification rules, such as association rules and rules    in Section 4, after the basic typology of rules extrac-
of observational logic, in real-world data mining tasks. The
                                                                 tion methods and examples of measures for classifica-
paer sketches the typology of rules extraction methods and
                                                                 tion rulesets are recalled in the following two sections,
of their rulesets, and recalls that quality measures for whole
sets of rules have been so far used only in the case of clas-    and before a generalization of ROC curves is proposed
sification rulesets. It then proposes three possible ways how    in Section 5. The paper concludes with a brief illus-
such measures can be extended to general rulesets. The pa-       tration on rulesets extracted with the method GUHA.
per also recalls the possibility to measure the dependence
of classification ruleset on parameters of the classification
method by means of ROC curves, and proposes a general-           2    Typology of rules extraction
ization of ROC curves to general rulesets. Finally, a brief           methods
illustration on rulesets extracted by means of the method
GUHA is given.
                                                                 The most natural base for differentiating between ex-
                                                                 isting rules extraction methods is the syntax and se-
1     Introduction                                               mantics of the extracted rules. Syntactical differences
                                                                 between them are, however, not very deep since prin-
Logical formulas of specific kinds, usually called rules,        cipally, any rule r has one of the forms Sr ∼ Sr0 , or
are a traditional way of formally representing knowl-            Ar → Cr , where Sr , Sr0 , Ar and Cr are formulas of
edge. Therefore, it is not surprising that they are also         the considered logic, and ∼, → are symbols of the
the most frequent representation of the knowledge dis-           language of that logic. The difference between both
covered in data mining. Existing methods for rules ex-           forms concerns semantic properties of the symbols ∼
traction are based on a broad variety of paradigms               and →: Sr ∼ Sr0 is symmetric with respect to Sr , Sr0 in
and theoretical principles. However, methods relying             the sense that its validity always coincides with that
on different underlying assumptions can lead to the              of Sr ∼ Sr0 whereas Ar → Cr is not symmetric with
extraction of different or even contradictory rulesets           respect to Ar , Cr in that sense. In the case of a proposi-
from the same data. Moreover, the set of rules ex-               tional logic, ∼ and → are the connectives equivalence
tracted with a particular method can substantially de-           and implication, respectively, whereas in the case of
pend on some tunable parameter or parameters of the              a predicate logic, they are generalized quantifiers. To
method, such as significance level, thresholds, size pa-         distinguish the formulas involved in the asymmetric
rameters, trade-off coefficients etc. For that reason, it        case, Ar is called antecedent and Cr consequent of r.
is desirable to have measures of various qualitative as-             The more important is the semantic of the rules
pects of the extracted rulesets. So far, such measures           (cf. [6]), especially the difference between rules of the
are available only for sets of classification rules, and         Boolean logic and rules of a fuzzy logic. Due to the
their dependence on tunable parameters can be de-                semantics of Boolean and fuzzy formulas, the former
scribed only for classification into two classes [10, 15].       are valid for crisp sets of objects, whereas the validity
As far as more general kinds of rules are concerned,             of the latter is a fuzzy set on the universe of all consid-
measures of quality have been proposed only for in-              ered objects. Boolean rulesets are extracted more fre-
dividual rules [6, 11, 24, 26, 29], or for contrast sets of      quently, especially some specific types of them, such as
rules, which finally can be replaced with a single rule          classification rulesets [11, 15]. Those are sets of impli-
[2, 16]; if a whole ruleset is taken into consideration,         cations such that (Ar )r∈R and {Cr }r∈R partition the
then only as a context for measuring the quality of an           set O of considered objects, where R is the considered
individual rule [27, 28].                                        ruleset, and {Cr }r∈R stands for the set of distinct for-
     The research reporeted in this paper has been mo-           mulas in (Cr )r∈R . Abandoning the requirement that
tivated by increasingly frequent extraction of non-clas-         (Ar )r∈R partitions O (at least in the sense of a crisp
partitioning) allows to generalize those rulesets also to         framework of observational logic, the terminology is
fuzzy antecedents. For Boolean antecedents, however,              a bit confusing here: although associational rules are
this requirement entails a natural definition of the va-          asymmetric, their name evokes the quantifier for the
lidity of a whole classification ruleset R for an object          symmetric ones).
x. Assuming that all information about x conveyed by                 Orthogonally to the typology according to the se-
R is conveyed by the single rule r covering x (i.e., with         mantics of the extracted rules, all extraction methods
Ar valid for x), the validity of R for x can be defined           can be divided into two large groups:
to coincide with the validity of Ar → Cr for that r,
which in turn equals the validity of Cr for x.                     – Methods that extract logical rules from data di-
     As far as the Boolean predicate logic is concerned,             rectly, without any intermediate formal represen-
generalized quantifiers both for symmetric and for a-                tation of the discovered knowledge. Such methods
symmetric rules were studied in the 1970s within the                 have always formed the mainstream of the extrac-
framework of the observational logic [13], which is a                tion of Boolean rules: from the observational logic
Boolean predicate logic with generalized quantifiers.                methods [13] and the method AQ [30, 31] in the
For a set of data about n objects, the truth evaluation              late 1970s, through the extraction of association
of the Boolean predicate ϕ on those objects is a vector              rules [1, 40] and the method CN2 [4], relying on a
kϕk ∈ {0, 1}n , whereas the truth evaluation of a sen-               paradigm similar to that of AQ, to recent methods
tence (Qx)(ϕ1 (x), . . . , ϕm (x)) consisting of m Boolean           based on inductive logic programming [5, 33] and
predicates ϕ1 , . . . , ϕm and an m-ary generalized quan-            genetic algorithms [9]. They include also impor-
tifier Q is the function value                                       tant methods for fuzzy rules, in particular ANFIS
                                                                     [22, 23] and NEFCLASS [34, 35], fuzzy generaliza-
 k(Qx)(ϕ1 (x), . . . , ϕm (x))k = Tf Q (kϕ1 k, . . . , kϕm k),       tions of observational logic [18, 19] and a recent
                                                            (1)      method based on fuzzy transform [36].
                                                                   – Methods that employ some intermediate represen-
of a {0, 1}-valued function Tf Q on the set of m-column              tation of the extracted knowledge, useful by itself.
binary matrices, which is called truth function of the               This group includes two important kinds of meth-
quantifier Q. Observational logic underlies one of the               ods: classification trees [3, 37] and methods based
earliest methods for the extraction of general rules                 on artificial neural networks (ANN). The latter
from data, called General Unary Hypotheses Automa-                   are used both for Boolean and for fuzzy rules [7,
ton (GUHA). In GUHA, the truth function Tf Q of a                    21, 39] (cf. also the survey papers [32, 38]).
generalized quantifier Q is always a function of the
4-fold table
                                                                  3      Existing measures for classification
                            Sr0 ¬Sr0
                            Cr ¬Cr                                      rulesets
                                     .                     (2)
                     S r Ar a b
                    ¬Sr ¬Ar c d                               A survey of measures of quality for classification rule-
                                                              sets (with possibly fuzzy antecedents) has been given
Hence, Tf Q is a {0, 1}-valued function on quadruples in the monograph [15]. All measures have been divided
of nonnegative integers. For symmetric rules, GUHA there into four groups: inaccuracy, imprecision, insep-
uses quantifiers fulfilling                                   arability and resemblance. Space limitation allows to
                                                              recall here only the main representatives of the more
    0         0          0       0                            important groups:
  a ≥a&b ≤b&c ≤c&d ≥d&
                                      0 0 0 0
     & Tf Q (a, b, c, d) = 1 → Tf Q (a , b , c , d ) = 1. (3)     Inaccuracy measures the discrepancy between the
                                                              true class of the considered objects and the class pre-
They are called associational quantifiers. For asym- dicted by the ruleset. Its most frequently encountered
metric rules, it uses quantifiers fulfilling the stronger representative is the quadratic score (also called Brier
condition                                                     score):

                                                                                 1 X                                  2
  a0 ≥ a & b0 ≤ b &
                                                                                         X
                                                                      Inacc =                          δC (x) − δ̂C (x) ,   (5)
                                       0   0   0   0                            |O|
    & Tf Q (a, b, c, d) = 1 → Tf Q (a , b , c , d ) = 1. (4)                      x∈O C∈{Cr }r∈R


which are called implicational quantifiers. This con-             where | | denotes cardinality, O is the considered set of
dition covers also the frequently encountered associa-            objects, δC (x) ∈ {0, 1} is the validity of the proposi-
tion rules [1, 6, 40] (since methods for the extraction           tion C for x ∈ O, and δ̂C (x) is the agreement between
of association rules have been developed outside the              C and the class predicted for x by R. In the general
case of a fuzzy logic, δ̂C (x) = maxCr =C kAr kx , with where
kAr kx ∈ h0, 1i denoting the truth grade of Ar for x.
    Imprecision measures the discrepancy between the                  O+ = {x ∈ O : R is valid for x},
                                                                                                                   (8)
probability distribution of the classes, conditioned on               O− = {x ∈ O : R is not valid for x}.
the values of attributes occurring in antecedents, and
the class predicted by the ruleset. Its most common This not only shows that, in the case of Boolean an-
representative is                                            tecedents, the quadratic score is sufficient to describe
                                                             also the imprecision, but also suggests an approach
   Impr =                                                    how to extend those measures to general rulesets: to
       1 X       X       “                ”“          ”2
                                                             use (7)–(8) as the definition of measures (5)–(6). More
   =                      δC (x) − δ̂C (x) 1 − δ̂C (x) .
      |O| x∈O                                                generally, any measure of quality of classification rule-
              C∈{Cr }r∈R
                                                         (6) sets with Boolean antecedents (e.g., any measure sur-
                                                             veyed in [15]) that can be reformulated by means of
                                                               +         −
    As was already mentioned in the introduction, the O and O , can be extended in such a way that the
extracted ruleset can substantially depend on tunable reformulation is used as the definition of that measure
parameters of the employed method. This was so far for general rulesets.
systematically studied only for dichotomous classifica-          For sets of asymmetric rules, also the notion of
tion with R = {A → C, ¬A → ¬C}. In that case,                covering  an object by a rule, which was recalled in
putting Ar = A, Cr = C allows the information about          Section  2, can be generalized. Notice, however, that
the validity of A and C for O to be again summarized         for fuzzy  antecedents, the validity of Ar , r ∈ R is a
by means of the 4-fold table (2), which also depends         fuzzy set  on O. Consequently,    the set OR of objects
on the parameter values. The influence of the param- covered by R is a fuzzy set on O with the membership
eter values on the result of dichotomous classification function
is usually investigated by means of the measures sen-
sitivity = a and specificity = d [15]. Connecting                   µR (x) = k(∃r ∈ R) Ar kx = max kAr kx .        (9)
           a+c                      b+d                                                            r∈R
                                          b     a
points (1-specificity,sensitivity) = ( b+d   , a+c ) for the
considered parameter values forms a curve with graph           Observe that according to (9), OR = O for classifica-
in the unit square, called receiver operating charac-          tion rulesets with Boolean antecedents. Therefore, var-
teristic (ROC), due to the area where such curves              ious generalizations of classification measures to gen-
have first been in routine use. In machine learning, a         eral rulesets of asymmetric rules are possible: wherever
modified version of those curves has been proposed, in         O occurs in the definition of a measure for classifica-
which the points connected for considered parameter            tion rulesets, either O or OR can occur in its general
values are (b, a) [10]. The graph of such a curve then         definition, provided OR 6= ∅. To allow unified treat-
lies in the rectangle with vertices (0, 0) and (b+d, a+c),     ment of symmetric and asymmetric rules, the concept
and is called coverage graph.                                  of covering an object by a rule will be extended also
    The graphs of ROC curves and coverage graphs can           to symmetric rules, in such a way that an object x is
provide information about the influence of parameter           covered by Sr ∼ Sr0 if either Sr or Sr0 is valid for x.
values not only on the sensitivity and specificity, but        Hence, a counterpart of (9) for a set R is a fuzzy set
also on other measures. It is sufficient to complement         with the membership function
the graph with isolines of the measure and to investi-
gate their intersections with the original curve [10].           µR (x) = k(∃r ∈ R)(Sr ∨ Sr0 )kx =
                                                                                    = max max(kSr kx , kSr0 kx ). (10)
                                                                                       r∈R

4     Three extensions to more general                     According to (8), the proposed way of extending
     kinds of rules                                    measures of quality from classification rulesets with
                                                       Boolean antecedents to general rulesets requires to
In the particular case of classification rulesets with generalize the concept of validity of a general ruleset
Boolean antecedents, some algebra allows to substan- for an object. However, there are multiple possibilities
tially simplify (5)–(6):                               for such a generalization. Indeed, at least any of the
                                                       following points of view is possible:
                   2|O− |       |O+ | − |O− |              Boolean validity of the ruleset based on si-
          Inacc =         =1−                 ,        multaneous validity of all covering rules. Accord-
                     |O|              |O|
                                                   (7) ing to this point of view, the validity of a ruleset R
                   |O− |      |O+ |
          Impr =         =1−        ,                  for a covered object x is a Boolean property express-
                    |O|        |O|                     ing the simultaneous validity of all rules that cover x.
Consequently, the sets O+ and O− defined in (8) are               ¬r. Observe that also this point of view has the above
crisp sets                                                        consequences (i)–(iii), the last one again due to the
                                                                  fact that there is exactly one rule covering x.
  O+ = {x ∈ O : µR (x) > 0 &                                          Fuzzy validity of the ruleset based on the
(∀r ∈ R) kr covers x & r is valid for xk = kr covers xk},         relative validity of covering rules. In this case,
                                                       (11)       the validity of a ruleset R for a covered object x is a
                                                                  fuzzy property expressing the ratio of the validity of
                                                                  rules from R for x to the covering of x with those rules.
  O− = {x ∈ O : µR (x) > 0 &                                      Consequently, the sets O+ and O− are fuzzy sets on O
(∃r ∈ R) kr covers x & r is valid for xk < kr covers xk},         with memberships µ+ and µ− , respectively, such that
                                                       (12)       if µR (x) > 0,
where
                                                                                  P
                                                                                     r∈R kr covers x & r is valid for xk
                                                                     µ+ (x) =              P
                                                                                             r∈R kr covers x|k
                (
                 k(Sr ∨ Sr0 )kx       for symmetric rules ,
 kr covers xk =
                 kAr kx               for asymmetric rules ,                                                           (17)
                                                                                 P
                                                         (13)                      r∈R kr covers x & ¬r is valid for xk
                                                                     µ− (x) =             P
and similarly                                                                               r∈R kr covers x|k
                                                                                                                       (18)
  kr covers x & r is valid for xk =                        where the involved truth grades are again evaluated
        (
          k(Sr ∨ Sr0 )&rkx for symmetric rules ,           according to (13) and (14). Moreover, (17)–(18) will be
      =                                               (14) complemented with the definition µ (x) = µ (x) = 0
          kAr &rkx            for asymmetric rules .                                              +         −
                                                           if µR (x) = 0, to get again the validity of (ii) above,
    The following consequences of this point of view whereas (i) and (iii) are consequences also of this point
are worth noticing:                                        of view. Further, the fact that O+ and O− are now
(i) It is immaterial how the truth grade krkx of a rule fuzzy sets implies that whenever |O+ | or |O− | occur
     r being valid for an object x is evaluated (thus also in the definitions of quality measures for Boolean clas-
     how k¬rkx is evaluated).                              sification rulesets, fuzzy cardinalities have to be used
(ii) If µR (x) = 0, then x 6∈ O+ ∪ O− .                    in their generalizations to general rulesets according
(iii) For classification rulesets with Boolean antece- to this point of view. Hence,
     dents, the validity of R according to this point of                  X                    X
     view coincides with the definition in Section 2 be-          |O+ | =     µ+ (x), |O− | =       µ− (x).    (19)
                                                                          x∈O                  x∈O
     cause in that case, there is exactly one rule that
     covers x.                                             For example, the measure
    Boolean validity of the ruleset based on the                                    X
validity of the majority of covering rules. Ac-                                         (µ+ (x) − µ− (x))
cording to this point of view, the validity of a ruleset               Inacc = 1 −
                                                                                    x∈O
                                                                                                               (20)
R for a covered object x is a Boolean property ex-                                           |O|
pressing the validity of most of the rules that cover x.
Consequently, the sets O+ and O− in (8) are crisp sets is a generalization of (5), whereas the measures
                                                                              X
     +                                                                             µ+ (x)
  O = {x ∈ O : µR (x) > 0 &
                                                                                    x∈O
         X                                                            Impr1 = 1 −               ,                     (21)
      &     kr covers x & r is valid for xk >                                        |O|
            r∈R                                                                     X                  X
                X                                                                     µ+ (x)                 µ+ (x)
           >          kr covers x & ¬r is valid for xk},   (15)
                                                                                    x∈O                x∈O
                r∈R                                                   Impr2 = 1 −               =1− X                 (22)
                                                                                       |OR |                 µR (x)
                                                                                                       x∈O
  O− = {x ∈ O : µR (x) > 0 &
           X                                                      are generalizations of (6).
        &      kr covers x & r is valid for xk
             r∈R
                X                                                 5    Extensions of ROC curves to more
          ≤|          kr covers x & ¬r is valid for xk},   (16)
                r∈R                                                   general kinds of rules
where the truth grade kr covers & ¬r is valid for xk Observe that in the case of Boolean classification with
is again evaluated according to (14), replacing r with R = {A → C, ¬A → ¬C}, the information about the
validity of R for objects x ∈ O can be also viewed as      6     Experimentally testing the
information about the validity of a ruleset R0 = {A →           approach
C}. However, R0 is not any more a classification rule-
set, but only a general one, which can be described        The proposed approach has been so far experimentally
only by means of the above introduced sets OR , O+ ,       tested for six rules extraction methods on three bench-
O− . In particular, |O+ | = a and |O− | = b, which         mark data sets, as well as on data from one real-world
suggests the possibility to generalize coverage graphs     knowledge discovery task [20]. For each method, 1–3
introduced in Section 3 to general rulesets by means of    parameters were tuned, the values of them being cho-
a curve connecting points (|O− |, |O+ |) for each of the   sen among 2–10 possibilities. For some data sets, some
values of the considered parameters. For a generaliza-     combinations of parameter values did not extract any
tion of ROC curves to general rulesets, those points       rules. Whenever a particular combination of parame-
have to be scaled to the unit square. Since the result-    ter vaules extracted a nonempty ruleset from the con-
ing curve will be used to investigate the dependence       sidered data, it was tested on those data by means of
on parameter values, the scaling factor itself must be     a 10-fold crossvalidation. Consequently, the number of
independent of those values. The only available fac-       rulesets extracted from each data set varied between
tor fulfilling this condition is the number of objects,    1000 and 1500.
|O| (the other available factors, |OR |, |O+ | and |O− |
depend on the evaluations kSr k and kSr0 k, or kAr k
and kCr k, which in turn depend on the parameter
values). Consequently, the proposed generalization of
                                       −
                                         | |O + |
ROC curves will connect points ( |O  |O| , |O| ).
    For practical construction of the proposed gener-
alization of ROC curves, the following proposition,
proven in [17], can be quite useful:
Proposition 1. Let the covering of individual objects
with individual rules be a Boolean property (i.e., the
set of rules covering a particular object x be a crisp
subset of R). Then irrespectively of which of the above
points of view of ruleset validity is adopted, there al-
ways exists a constant c ∈ (0, 1i and an increasing
bijection g : h0, ci → h0, 1i such that

 |O+ | + |O− | ≤ max(1, max x + g −1 (1 − g(x)))|O|.
                        x∈h0,ci
                                                   (23)

Moreover, in the particular cases of Boolean logic and
of all three fundamental fuzzy logics (Lukasiewicz, Gö-
del, product), (23) holds with c = 1 and g equal to
identity,

                  |O+ | + |O− | ≤ |O|.             (24)
                                     −     +
Thus in those cases, the points ( |O  | |O |
                                   |O| , |O| ), forming
the generalization of ROC curves, lie below the diago-
nal (h0, 1i, h1, 0i).
                                                           Fig. 1. Isolines of the three measures introduced in (20)–
    The proposition is illustrated in Figure 1, together                                                    −
                                                                                                              | |O + |
                                                           (22), drawn with respect to the coordinates ( |O
                                                                                                          |O|
                                                                                                               , |O| ) of
with isolines of the three example measures introduced
                                                           points forming the proposed generalization of ROC curves
in (20)–(22). Observe that the isolines of Impr2 de-                                     .
pend on the relationship    between the
                                      P three cardinal-
ities |O+P| = x∈O µ+ (x), |O− | = x∈O µ− (x) and
               P
|OR | = x∈O µR (x). The isolines depicted in Figure
1(c) correspond to the relationship |OR | = |O+ | +           As a very brief illustration, Figure 2 shows the pro-
|O− |, which is true in Lukasiewicz logic (thus in par-    posed generalization of ROC curves for two rulesets
ticular also in Boolean logic).                            extracted from the best known benchmark set, the iris
                                                              to reflect uncertain validity of rulesets extracted from
                                                              data when measuring their quality. The outcomes of
                                                              that investigation are intended to be published else-
                                                              where [17]. They comprise theoretical elaboration of
                                                              the last proposed kind of extensions of ruleset quality
                                                              measures, as well as results of extensive experimental
                                                              tests on rulesets extracted from benchmark and real-
                                                              world data sets by means of six methods attempting
                                                              to cover a possibly broad spectrum of rules extraction
                                                              methods. Those results indicate that the approach is
                                                              feasible and can contribute to the ultimate objective
                                                              of quality measures: to allow comparing the knowledge
                                                              extracted with different data mining methods and in-
                                                              vestigating how the extracted knowledge depends on
                                                              the values of their parameters.


                                                              Acknowledgment

                                                           The research reported in this paper has been sup-
                                                           ported by the grant No. 201/08/1744 of the Grant
                                                           Agency of the Czech Republic and partially supported
Fig. 2. Example of generalized ROC curves for rulesets ex-
                                                           by the Institutional Research Plan AV0Z10300504.
tracted from the iris data by means of the GUHA quantifier
founded implication
                                                              References

data, originally used in 1930s by R.A. Fisher [8], by          1. R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and
means of the GUHA quantifier founded implication.                 A.I. Verkamo. Fast discovery of association rules. In
This quantifier, denoted →s,θ , s, θ ∈ (0, 1i has its truth       Advances in Knowledge Discovery and Data Mining,
function Tf →s,θ defined in such a way that the rule              pages 307–328. AAAI Press, Menlo Park, 1996.
                                                               2. S.D. Bay and M.J. Pazzani. Detecting group differ-
Ar →s,θ Cr is valid exactly for those data for which
                                                                  ences. mining contrast sets. Data Mining and Knowl-
the conditional probability p(Cr |Ar ) of the validity of
                                                                  edge Discovery, 5:213–246, 2001.
Cr conditioned on Ar , estimated with the unbiased es-         3. L. Breiman, J.H. Friedman, R.A. Olshen, and
         a
timate a+b , is at least θ, whereas Ar and Cr are simul-          C.J. Stone. Classification and Regression Trees.
taneously valid in at least the proportion s of the data          Wadsworth, Belmont, 1984.
                                a               a
[13]. Hence, Tf →s,θ = 1 iff a+b   ≥ θ & a+b+c+d       ≥ s.    4. P. Clark and R. Boswell. Rule induction with CN2:
As was pointed out in [14], rules with this quantifier            Some recent improvements. In Machine Learning –
are actually association rules with support s and confi-          EWSL-91, pages 151–163. Springer Verlag, New York,
dence θ. Each curve corresponds to changing only one              1991.
of the parameters s, θ, the value of the other is fixed.       5. L. De Raedt. Interactive Theory Revision: An Induc-
                                                                  tive Logic Programming Approach. Academic Press,
                                                                  London, 1992.
7     Conclusions                                              6. D. Dubois, Hüllermeier, and H. Prade. A systematic
                                                                  approach to the assessment of fuzzy association rules.
The paper has dealt with quality measures of rules                Data Mining and Knowledge Discovery, 13:167–192,
extracted from data, though not in the usual context              2006.
of individual rules, but in the context of whole rule-         7. W. Duch, R. Adamczak, and K. Grabczewski. A new
                                                                  methodology of extraction, optimization and applica-
sets. Three kinds of extensions of measures already in
                                                                  tion of crisp and fuzzy logical rules. IEEE Transactions
use for classification rulesets have been proposed. In
                                                                  on Neural Networks, 11:277–306, 2000.
addition, the concept of ROC-curves has been general-          8. R.A. Fisher. The use of multiple measurements in
ized, to enable investigating the dependence of general           taxonomic problems. Annals of Eugenics, 7:179–188,
rulesets on the values of parameters of the extraction            1936.
method.                                                        9. A.A. Freitas. Data Mining and Knowledge Discovery
    The paper actuallly discusses some general aspects            with Evolutionary Algorithms. Springer Verlag, Berlin,
related to an ongoing investigation into the possibility          2002.
10. J. Fürnkranz and P.A. Flach. ROC ’n’ rule learning –       28. L. Lerman and J. Azè. Une mesure probabi-
    towards a better understanding of covering algorithms.          liste contextuelle discriminante de qualite des règles
    Machine Learning, 58:39–77, 2005.                               d’association. In EGC 2003: Extraction et Gestion des
11. L. Geng and H.J. Hamilton. Choosing the right lens:             Connaissances, pages 247–263. Hermes Science Publi-
    Finding what is interesting in data mining. In F. Guil-         cations, Lavoisier, 2003.
    let and H.J. Hamilton, editors, Quality Measures in         29. K. McGarry. A survey of interestingness measures for
    Data Mining, pages 3–24. Springer Verlag, Berlin,               knowledge discovery. Knowledge Engineering Review,
    2007.                                                           20:39–61, 2005.
12. P. Hájek. Metamathematics of Fuzzy Logic. Kluwer           30. R.S. Michalski. Knowledge acquisition through con-
    Academic Publishers, Dordrecht, 1998.                           ceptual clustering: A theoretical framework and algo-
13. P. Hájek and T. Havránek. Mechanizing Hypothesis              rithm for partitioning data into conjunctive concepts.
    Formation. Springer Verlag, Berlin, 1978.                       International Journal of Policy Analysis and Informa-
14. P. Hájek and M. Holeňa. Formal logics of discovery and        tion Systems, 4:219–243, 1980.
    hypothesis formation by machine. Theoretical Com-           31. R.S. Michalski and K.A. Kaufman. Learning patterns
    puter Science, 292:345–357, 2003.                               in noisy data. In Machine Learning and Its Applica-
15. D.J. Hand. Construction and Assessment of Classifi-             tions, pages 22–38. Springer Verlag, New York, 2001.
    cation Rules. John Wiley and Sons, New York, 1997.          32. S. Mitra and Y. Hayashi. Neuro-fuzzy rule generation:
16. R.J. Hilderman and T. Peckham. Statistical method-              Survey in soft computing framework. IEEE Transac-
    ologies for mining potentially interesting contrast sets.       tions on Neural Networks, 11:748–768, 2000.
    In F. Guillet and H.J. Hamilton, editors, Quality Mea-      33. S. Muggleton. Inductive Logic Programming. Aca-
    sures in Data Mining, pages 153–177. Springer Verlag,           demic Press, London, 1992.
    Berlin, 2007.                                               34. D. Nauck. Fuzzy data analysis with NEFCLASS.
17. M. Holeňa. Measures of ruleset quality capable to rep-         International Journal of Approximate Reasoning,
    resent uncertain validity. Submitted to International           32:103–130, 2002.
    Journal of Approximate Reasoning.                           35. D. Nauck and R. Kruse. NEFCLASS-X: A neuro-fuzzy
18. M. Holeňa. Fuzzy hypotheses for Guha implications.             tool to build readable fuzzy classifiers. BT Technology
    Fuzzy Sets and Systems, 98:101–125, 1998.                       Journal, 3:180–192, 1998.
19. M. Holeňa. Fuzzy hypotheses testing in the framework       36. V. Novák, I. Perfilieva, A. Dvořák, C.Q. Chen, Q. Wei,
    of fuzzy logic. Fuzzy Sets and Systems, 145:229–252,            and P. Yan. Mining pure linguistic associations from
    2004.                                                           numerical data. To appear in International Journal of
20. M. Holeňa. Neural networks for extraction of fuzzy             Approximate Reasoning.
    logic rules with application to EEG data. In B. Ri-         37. J. Quinlan. C4.5: Programs for Machine Learning.
    beiro, R.F. Albrecht, and A. Dobnikar, editors, Adap-           Morgan Kaufmann Publishers, San Francisco, 1992.
    tive and Natural Computing Algorithms, pages 369–           38. A.B. Tickle, R. Andrews, M. Golea, and J. Diederich.
    372. Springer Verlag, Wien, 2005.                               The truth will come to light: Directions and chal-
21. M. Holeňa. Piecewise-linear neural networks and their          lenges in extracting rules from trained artificial neu-
    relationship to rule extraction from data. Neural Com-          ral networks. IEEE Transactions on Neural Networks,
    putation, 18:2813–2853, 2006.                                   9:1057–1068, 1998.
22. J.S.R. Jang. ANFIS: Adaptive-network-based fuzzy            39. H. Tsukimoto. Extracting rules from trained neural
    inference system. IEEE Transactions on Systems,                 networks. IEEE Transactions on Neural Networks,
    Man, and Cybernetics, 23:665–685, 1993.                         11:333–389, 2000.
23. J.S.R. Jang and C.T. Sun. Neuro-fuzzy modeling and          40. M.J. Zaki, S. Parathasarathy, M. Ogihara, and W. Li.
    control. The Proceedings of the IEEE, 83:378–406,               New parallel algorithms for fast discovery of associ-
    1995.                                                           ation rules. Data Mining and Knowledge Discovery,
24. K.A. Kaufman and R.S. Michalski. An adjustable de-              1:343–373, 1997.
    scription quality measure for pattern discovery using
    the AQ methodology. Journal of Intelligent Informa-
    tion Systems, 14:199–216, 2000.
25. E.P. Klement, R. Mesiar, and E. Pap. Triangular
    Norms. Kluwer Academic Publishers, Dordrecht, 2000.
26. S. Lallich, O. Teytaud, and E. Prudhomme. Associa-
    tion rule interestingness: Measure and statistical vali-
    dation. In F. Guillet and H.J. Hamilton, editors, Qual-
    ity Measures in Data Mining, pages 251–275. Springer
    Verlag, Berlin, 2007.
27. P. Lenca, B. Vaiilant, P. Meyer, and S. Lalich. As-
    sociation rule interestingness meaures: Experimental
    and theoretical studies. In F. Guillet and H.J. Hamil-
    ton, editors, Quality Measures in Data Mining, pages
    51–76. Springer Verlag, Berlin, 2007.