<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Measures of Quality of Rulesets Extracted from Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Martin Holenˇa</string-name>
          <email>martin@cs.cas.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Computer Science, Academy of Sciences of the Czech Republic</institution>
          ,
          <addr-line>Pod vod ́arenskou vˇeˇz ́ı 2, 18207 Praha 8</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The paper deals with quality measures of whole sets of rules extracted from data, as a counterpart to more commonly used measures of individual rules. This research has been motivated by increasingly frequent extraction of non-classification rules, such as association rules and rules of observational logic, in real-world data mining tasks. The paer sketches the typology of rules extraction methods and of their rulesets, and recalls that quality measures for whole sets of rules have been so far used only in the case of classification rulesets. It then proposes three possible ways how such measures can be extended to general rulesets. The paper also recalls the possibility to measure the dependence of classification ruleset on parameters of the classification method by means of ROC curves, and proposes a generalization of ROC curves to general rulesets. Finally, a brief illustration on rulesets extracted by means of the method GUHA is given.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Typology of rules extraction methods</title>
      <sec id="sec-1-1">
        <title>The most natural base for differentiating between ex</title>
        <p>isting rules extraction methods is the syntax and
se1 Introduction mantics of the extracted rules. Syntactical differences
between them are, however, not very deep since
prinLogical formulas of specific kinds, usually called rules, cipally, any rule r has one of the forms Sr ∼ Sr0, or
are a traditional way of formally representing knowl- Ar → Cr, where Sr, Sr0, Ar and Cr are formulas of
edge. Therefore, it is not surprising that they are also the considered logic, and ∼, → are symbols of the
the most frequent representation of the knowledge dis- language of that logic. The difference between both
covered in data mining. Existing methods for rules ex- forms concerns semantic properties of the symbols ∼
traction are based on a broad variety of paradigms and →: Sr ∼ Sr0 is symmetric with respect to Sr, Sr0 in
and theoretical principles. However, methods relying the sense that its validity always coincides with that
on different underlying assumptions can lead to the of Sr ∼ Sr0 whereas Ar → Cr is not symmetric with
extraction of different or even contradictory rulesets respect to Ar, Cr in that sense. In the case of a
proposifrom the same data. Moreover, the set of rules ex- tional logic, ∼ and → are the connectives equivalence
tracted with a particular method can substantially de- and implication, respectively, whereas in the case of
pend on some tunable parameter or parameters of the a predicate logic, they are generalized quantifiers. To
method, such as significance level, thresholds, size pa- distinguish the formulas involved in the asymmetric
rameters, trade-off coefficients etc. For that reason, it case, Ar is called antecedent and Cr consequent of r.
is desirable to have measures of various qualitative as- The more important is the semantic of the rules
pects of the extracted rulesets. So far, such measures (cf. [6]), especially the difference between rules of the
are available only for sets of classification rules, and Boolean logic and rules of a fuzzy logic. Due to the
their dependence on tunable parameters can be de- semantics of Boolean and fuzzy formulas, the former
scribed only for classification into two classes [10, 15]. are valid for crisp sets of objects, whereas the validity
As far as more general kinds of rules are concerned, of the latter is a fuzzy set on the universe of all
considmeasures of quality have been proposed only for in- ered objects. Boolean rulesets are extracted more
fredividual rules [6, 11, 24, 26, 29], or for contrast sets of quently, especially some specific types of them, such as
rules, which finally can be replaced with a single rule classification rulesets [11, 15]. Those are sets of
impli[2, 16]; if a whole ruleset is taken into consideration, cations such that (Ar)r∈R and {Cr}r∈R partition the
then only as a context for measuring the quality of an set O of considered objects, where R is the considered
individual rule [27, 28]. ruleset, and {Cr}r∈R stands for the set of distinct
for</p>
        <p>The research reporeted in this paper has been mo- mulas in (Cr)r∈R. Abandoning the requirement that
tivated by increasingly frequent extraction of non-clas- (Ar)r∈R partitions O (at least in the sense of a crisp
Sr Ar
¬Sr ¬Ar c</p>
        <p>Sr0 ¬Sr0
Cr ¬Cr .
a b</p>
        <p>d
k(Qx)(ϕ1(x), . . . , ϕm(x))k = TfQ(kϕ1k, . . . , kϕmk),</p>
        <p>
          (
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
partitioning) allows to generalize those rulesets also to framework of observational logic, the terminology is
fuzzy antecedents. For Boolean antecedents, however, a bit confusing here: although associational rules are
this requirement entails a natural definition of the va- asymmetric, their name evokes the quantifier for the
lidity of a whole classification ruleset R for an object symmetric ones).
x. Assuming that all information about x conveyed by Orthogonally to the typology according to the
seR is conveyed by the single rule r covering x (i.e., with mantics of the extracted rules, all extraction methods
Ar valid for x), the validity of R for x can be defined can be divided into two large groups:
to coincide with the validity of Ar → Cr for that r,
which in turn equals the validity of Cr for x.
        </p>
        <p>As far as the Boolean predicate logic is concerned,
generalized quantifiers both for symmetric and for
asymmetric rules were studied in the 1970s within the
framework of the observational logic [13], which is a
Boolean predicate logic with generalized quantifiers.</p>
        <p>For a set of data about n objects, the truth evaluation
of the Boolean predicate ϕ on those objects is a vector
kϕk ∈ {0, 1}n, whereas the truth evaluation of a
sentence (Qx)(ϕ1(x), . . . , ϕm(x)) consisting of m Boolean
predicates ϕ1, . . . , ϕm and an m-ary generalized
quantifier Q is the function value
– Methods that extract logical rules from data
directly, without any intermediate formal
representation of the discovered knowledge. Such methods
have always formed the mainstream of the
extraction of Boolean rules: from the observational logic
methods [13] and the method AQ [30, 31] in the
late 1970s, through the extraction of association
rules [1, 40] and the method CN2 [4], relying on a
paradigm similar to that of AQ, to recent methods
based on inductive logic programming [5, 33] and
genetic algorithms [9]. They include also
important methods for fuzzy rules, in particular ANFIS
[22, 23] and NEFCLASS [34, 35], fuzzy
generalizations of observational logic [18, 19] and a recent
method based on fuzzy transform [36].
– Methods that employ some intermediate
representation of the extracted knowledge, useful by itself.</p>
        <p>
          This group includes two important kinds of
methods: classification trees [3, 37] and methods based
on artificial neural networks (ANN). The latter
are used both for Boolean and for fuzzy rules [7,
21, 39] (cf. also the survey papers [32, 38]).
of a {0, 1}-valued function TfQ on the set of m-column
binary matrices, which is called truth function of the
quantifier Q. Observational logic underlies one of the
earliest methods for the extraction of general rules
from data, called General Unary Hypotheses
Automaton (GUHA). In GUHA, the truth function TfQ of a
generalized quantifier Q is always a function of the
4-fold table
(
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
3
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Existing measures for classification rulesets</title>
      <sec id="sec-2-1">
        <title>A survey of measures of quality for classification rule</title>
        <p>
          sets (with possibly fuzzy antecedents) has been given
Hence, TfQ is a {0, 1}-valued function on quadruples in the monograph [15]. All measures have been divided
of nonnegative integers. For symmetric rules, GUHA there into four groups: inaccuracy, imprecision,
insepuses quantifiers fulfilling arability and resemblance. Space limitation allows to
recall here only the main representatives of the more
a0 ≥ a &amp; b0 ≤ b &amp; c0 ≤ c &amp; d0 ≥ d &amp; important groups:
&amp; TfQ(a, b, c, d) = 1 → TfQ(a0, b0, c0, d0) = 1. (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) Inaccuracy measures the discrepancy between the
true class of the considered objects and the class
preThey are called associational quantifiers. For asym- dicted by the ruleset. Its most frequently encountered
metric rules, it uses quantifiers fulfilling the stronger representative is the quadratic score (also called Brier
condition score):
a0 ≥ a &amp; b0 ≤ b &amp;
        </p>
        <p>
          &amp; TfQ(a, b, c, d) = 1 → TfQ(a0, b0, c0, d0) = 1. (
          <xref ref-type="bibr" rid="ref4">4</xref>
          )
which are called implicational quantifiers. This
condition covers also the frequently encountered
association rules [1, 6, 40] (since methods for the extraction
of association rules have been developed outside the
Inacc =
δC (x) − δˆC (x)
2
        </p>
        <p>
          , (
          <xref ref-type="bibr" rid="ref5">5</xref>
          )
where | | denotes cardinality, O is the considered set of
objects, δC (x) ∈ {0, 1} is the validity of the
proposition C for x ∈ O, and δˆC (x) is the agreement between
C and the class predicted for x by R. In the general
where
        </p>
        <p>O+ = {x ∈ O : R is valid for x},
O− = {x ∈ O : R is not valid for x}.</p>
        <p>
          (
          <xref ref-type="bibr" rid="ref8">8</xref>
          )
case of a fuzzy logic, δˆC (x) = maxCr=C kArkx, with
kArkx ∈ h0, 1i denoting the truth grade of Ar for x.
        </p>
        <p>Imprecision measures the discrepancy between the
probability distribution of the classes, conditioned on
the values of attributes occurring in antecedents, and
the class predicted by the ruleset. Its most common
representative is</p>
      </sec>
      <sec id="sec-2-2">
        <title>This not only shows that, in the case of Boolean an</title>
        <p>
          tecedents, the quadratic score is sufficient to describe
also the imprecision, but also suggests an approach
Impr = how to extend those measures to general rulesets: to
= 1 X X “δC (x) − δˆC (x)” “1 − δˆC (x)”2 . use (
          <xref ref-type="bibr" rid="ref7">7</xref>
          )–(
          <xref ref-type="bibr" rid="ref8">8</xref>
          ) as the definition of measures (
          <xref ref-type="bibr" rid="ref5">5</xref>
          )–(
          <xref ref-type="bibr" rid="ref6">6</xref>
          ). More
generally, any measure of quality of classification
rule|O| x∈O C∈{Cr}r∈R
(
          <xref ref-type="bibr" rid="ref6">6</xref>
          ) sets with Boolean antecedents (e.g., any measure
sur
        </p>
        <p>veyed in [15]) that can be reformulated by means of</p>
        <p>
          As was already mentioned in the introduction, the O+ and O−, can be extended in such a way that the
extracted ruleset can substantially depend on tunable reformulation is used as the definition of that measure
parameters of the employed method. This was so far for general rulesets.
systematically studied only for dichotomous classifica- For sets of asymmetric rules, also the notion of
tion with R = {A → C, ¬A → ¬C}. In that case, covering an object by a rule, which was recalled in
putting Ar = A, Cr = C allows the information about Section 2, can be generalized. Notice, however, that
the validity of A and C for O to be again summarized for fuzzy antecedents, the validity of Ar, r ∈ R is a
by means of the 4-fold table (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ), which also depends fuzzy set on O. Consequently, the set OR of objects
on the parameter values. The influence of the param- covered by R is a fuzzy set on O with the membership
eter values on the result of dichotomous classification function
is usually investigated by means of the measures
sensitivity = a+ac and specificity = b+dd [15]. Connecting μR(x) = k(∃r ∈ R) Arkx = mr∈aRx kArkx. (
          <xref ref-type="bibr" rid="ref9">9</xref>
          )
points (1-specificity,sensitivity) = ( b+b d , a+ac ) for the
considered parameter values forms a curve with graph
in the unit square, called receiver operating
characteristic (ROC), due to the area where such curves
have first been in routine use. In machine learning, a
modified version of those curves has been proposed, in
which the points connected for considered parameter
values are (b, a) [10]. The graph of such a curve then
lies in the rectangle with vertices (0, 0) and (b+d, a+c),
and is called coverage graph.
        </p>
        <p>The graphs of ROC curves and coverage graphs can
provide information about the influence of parameter
values not only on the sensitivity and specificity, but
also on other measures. It is sufficient to complement
the graph with isolines of the measure and to
investigate their intersections with the original curve [10].</p>
        <p>
          Observe that according to (
          <xref ref-type="bibr" rid="ref9">9</xref>
          ), OR = O for
classification rulesets with Boolean antecedents. Therefore,
various generalizations of classification measures to
general rulesets of asymmetric rules are possible: wherever
O occurs in the definition of a measure for
classification rulesets, either O or OR can occur in its general
definition, provided OR 6= ∅. To allow unified
treatment of symmetric and asymmetric rules, the concept
of covering an object by a rule will be extended also
to symmetric rules, in such a way that an object x is
covered by Sr ∼ Sr0 if either Sr or Sr0 is valid for x.
        </p>
        <p>
          Hence, a counterpart of (
          <xref ref-type="bibr" rid="ref9">9</xref>
          ) for a set R is a fuzzy set
with the membership function
μR(x) = k(∃r ∈ R)(Sr ∨ Sr0)kx =
= max max(kSrkx, kSr0kx). (10)
r∈R
4
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Three extensions to more general kinds of rules</title>
      <sec id="sec-3-1">
        <title>According to (8), the proposed way of extending</title>
        <p>
          measures of quality from classification rulesets with
Boolean antecedents to general rulesets requires to
In the particular case of classification rulesets with generalize the concept of validity of a general ruleset
Boolean antecedents, some algebra allows to substan- for an object. However, there are multiple possibilities
tially simplify (
          <xref ref-type="bibr" rid="ref5">5</xref>
          )–(
          <xref ref-type="bibr" rid="ref6">6</xref>
          ): for such a generalization. Indeed, at least any of the
following points of view is possible:
Inacc = 2|O−| = 1 − |O+| − |O−| , muBltoanoleeoauns vvaalliiddiittyy ooff atlhlecorvuelreisnegt
rbualeses.dAocncorsdi-Impr = |O|O−|| = 1 − |O+| ,|O| (
          <xref ref-type="bibr" rid="ref7">7</xref>
          ) ifnorg atocotvheisrepdooinbtjeocft vxieiws,a
tBheoovlaealindiptyroopferatyruelxepsertesRs|O| |O| ing the simultaneous validity of all rules that cover x.
|OR|
are generalizations of (
          <xref ref-type="bibr" rid="ref6">6</xref>
          ).
        </p>
        <p>X μ+(x)
= 1 − xX∈O μR(x)
x∈O
(21)
(22)
r∈R
r∈R
O+ = {x ∈ O : μR(x) &gt; 0 &amp;
&amp; X kr covers x &amp; r is valid for xk &gt;
r∈R
&gt; X kr covers x &amp; ¬r is valid for xk}, (15)
O− = {x ∈ O : μR(x) &gt; 0 &amp;
&amp; X kr covers x &amp; r is valid for xk
≤ |
r∈R
X kr covers x &amp; ¬r is valid for xk}, (16)
5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Extensions of ROC curves to more general kinds of rules</title>
      <p>where the truth grade kr covers &amp; ¬r is valid for xk
is again evaluated according to (14), replacing r with</p>
      <sec id="sec-4-1">
        <title>Observe that in the case of Boolean classification with</title>
        <p>R = {A → C, ¬A → ¬C}, the information about the
validity of R for objects x ∈ O can be also viewed as 6 Experimentally testing the
information about the validity of a ruleset R0 = {A → approach
C}. However, R0 is not any more a classification
ruleset, but only a general one, which can be described The proposed approach has been so far experimentally
only by means of the above introduced sets OR, O+, tested for six rules extraction methods on three
benchO−. In particular, |O+| = a and |O−| = b, which mark data sets, as well as on data from one real-world
suggests the possibility to generalize coverage graphs knowledge discovery task [20]. For each method, 1–3
introduced in Section 3 to general rulesets by means of parameters were tuned, the values of them being
choa curve connecting points (|O−|, |O+|) for each of the sen among 2–10 possibilities. For some data sets, some
values of the considered parameters. For a generaliza- combinations of parameter values did not extract any
tion of ROC curves to general rulesets, those points rules. Whenever a particular combination of
paramehave to be scaled to the unit square. Since the result- ter vaules extracted a nonempty ruleset from the
coning curve will be used to investigate the dependence sidered data, it was tested on those data by means of
on parameter values, the scaling factor itself must be a 10-fold crossvalidation. Consequently, the number of
independent of those values. The only available fac- rulesets extracted from each data set varied between
tor fulfilling this condition is the number of objects, 1000 and 1500.
|O| (the other available factors, |OR|, |O+| and |O−|
depend on the evaluations kSrk and kSr0k, or kArk
and kCrk, which in turn depend on the parameter
values). Consequently, the proposed generalization of
ROC curves will connect points ( |O−| , |O+| ).</p>
        <p>|O| |O|</p>
        <p>For practical construction of the proposed
generalization of ROC curves, the following proposition,
proven in [17], can be quite useful:
Proposition 1. Let the covering of individual objects
with individual rules be a Boolean property (i.e., the
set of rules covering a particular object x be a crisp
subset of R). Then irrespectively of which of the above
points of view of ruleset validity is adopted, there
always exists a constant c ∈ (0, 1i and an increasing
bijection g : h0, ci → h0, 1i such that
|O+| + |O−| ≤ max(1, max x + g−1(1 − g(x)))|O|.</p>
        <p>x∈h0,ci
Moreover, in the particular cases of Boolean logic and
of all three fundamental fuzzy logics (Lukasiewicz,
G¨odel, product), (23) holds with c = 1 and g equal to
identity,</p>
        <p>|O+| + |O−| ≤ |O|.</p>
        <p>Thus in those cases, the points ( |O−| , |O+| ), forming</p>
        <p>|O| |O|
the generalization of ROC curves, lie below the
diagonal (h0, 1i, h1, 0i).</p>
        <p>The proposition is illustrated in Figure 1, together
with isolines of the three example measures introduced
in (20)–(22). Observe that the isolines of Impr2
depend on the relationship between the three
cardinalities |O+| = Px∈O μ+(x), |O−| = Px∈O μ−(x) and
|1O(cR) | c=orPresxp∈oOndμRt(ox)t.hTehreelaistoiolinnsehsipde|pOicRte|d=in|OFi+gu|r+e
|O−|, which is true in Lukasiewicz logic (thus in
particular also in Boolean logic).</p>
        <p>(23)
(24)</p>
      </sec>
      <sec id="sec-4-2">
        <title>As a very brief illustration, Figure 2 shows the proposed generalization of ROC curves for two rulesets extracted from the best known benchmark set, the iris</title>
        <p>Fig. 2. Example of generalized ROC curves for rulesets
extracted from the iris data by means of the GUHA quantifier
founded implication
data, originally used in 1930s by R.A. Fisher [8], by
means of the GUHA quantifier founded implication.
This quantifier, denoted →s,θ, s, θ ∈ (0, 1i has its truth
function Tf→s,θ defined in such a way that the rule
Ar →s,θ Cr is valid exactly for those data for which
the conditional probability p(Cr|Ar) of the validity of
Cr conditioned on Ar, estimated with the unbiased
estimate a+ab , is at least θ, whereas Ar and Cr are
simultaneously valid in at least the proportion s of the data
a a
[13]. Hence, Tf→s,θ = 1 iff a+b ≥ θ &amp; a+b+c+d ≥ s.
As was pointed out in [14], rules with this quantifier
are actually association rules with support s and
confidence θ. Each curve corresponds to changing only one
of the parameters s, θ, the value of the other is fixed.
7</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>The paper has dealt with quality measures of rules
extracted from data, though not in the usual context
of individual rules, but in the context of whole
rulesets. Three kinds of extensions of measures already in
use for classification rulesets have been proposed. In
addition, the concept of ROC-curves has been
generalized, to enable investigating the dependence of general
rulesets on the values of parameters of the extraction
method.</p>
      <p>The paper actuallly discusses some general aspects
related to an ongoing investigation into the possibility
to reflect uncertain validity of rulesets extracted from
data when measuring their quality. The outcomes of
that investigation are intended to be published
elsewhere [17]. They comprise theoretical elaboration of
the last proposed kind of extensions of ruleset quality
measures, as well as results of extensive experimental
tests on rulesets extracted from benchmark and
realworld data sets by means of six methods attempting
to cover a possibly broad spectrum of rules extraction
methods. Those results indicate that the approach is
feasible and can contribute to the ultimate objective
of quality measures: to allow comparing the knowledge
extracted with different data mining methods and
investigating how the extracted knowledge depends on
the values of their parameters.</p>
      <p>Acknowledgment</p>
      <sec id="sec-5-1">
        <title>The research reported in this paper has been supported by the grant No. 201/08/1744 of the Grant Agency of the Czech Republic and partially supported by the Institutional Research Plan AV0Z10300504.</title>
        <p>10. J. Fu¨rnkranz and P.A. Flach. ROC ’n’ rule learning – 28. L. Lerman and J. Az`e. Une mesure
probabitowards a better understanding of covering algorithms. liste contextuelle discriminante de qualite des r`egles
Machine Learning, 58:39–77, 2005. d’association. In EGC 2003: Extraction et Gestion des
11. L. Geng and H.J. Hamilton. Choosing the right lens: Connaissances, pages 247–263. Hermes Science
PubliFinding what is interesting in data mining. In F. Guil- cations, Lavoisier, 2003.
let and H.J. Hamilton, editors, Quality Measures in 29. K. McGarry. A survey of interestingness measures for
Data Mining, pages 3–24. Springer Verlag, Berlin, knowledge discovery. Knowledge Engineering Review,
2007. 20:39–61, 2005.
12. P. H´ajek. Metamathematics of Fuzzy Logic. Kluwer 30. R.S. Michalski. Knowledge acquisition through
con</p>
        <p>Academic Publishers, Dordrecht, 1998. ceptual clustering: A theoretical framework and
algo13. P. H´ajek and T. Havr´anek. Mechanizing Hypothesis rithm for partitioning data into conjunctive concepts.</p>
        <p>Formation. Springer Verlag, Berlin, 1978. International Journal of Policy Analysis and
Informa14. P. H´ajek and M. Holenˇa. Formal logics of discovery and tion Systems, 4:219–243, 1980.
hypothesis formation by machine. Theoretical Com- 31. R.S. Michalski and K.A. Kaufman. Learning patterns
puter Science, 292:345–357, 2003. in noisy data. In Machine Learning and Its
Applica15. D.J. Hand. Construction and Assessment of Classifi- tions, pages 22–38. Springer Verlag, New York, 2001.</p>
        <p>cation Rules. John Wiley and Sons, New York, 1997. 32. S. Mitra and Y. Hayashi. Neuro-fuzzy rule generation:
16. R.J. Hilderman and T. Peckham. Statistical method- Survey in soft computing framework. IEEE
Transacologies for mining potentially interesting contrast sets. tions on Neural Networks, 11:748–768, 2000.
In F. Guillet and H.J. Hamilton, editors, Quality Mea- 33. S. Muggleton. Inductive Logic Programming.
Acasures in Data Mining, pages 153–177. Springer Verlag, demic Press, London, 1992.</p>
        <p>Berlin, 2007. 34. D. Nauck. Fuzzy data analysis with NEFCLASS.
17. M. Holenˇa. Measures of ruleset quality capable to rep- International Journal of Approximate Reasoning,
resent uncertain validity. Submitted to International 32:103–130, 2002.</p>
        <p>Journal of Approximate Reasoning. 35. D. Nauck and R. Kruse. NEFCLASS-X: A neuro-fuzzy
18. M. Holenˇa. Fuzzy hypotheses for Guha implications. tool to build readable fuzzy classifiers. BT Technology</p>
        <p>Fuzzy Sets and Systems, 98:101–125, 1998. Journal, 3:180–192, 1998.
19. M. Holenˇa. Fuzzy hypotheses testing in the framework 36. V. Nova´k, I. Perfilieva, A. Dvoˇr´ak, C.Q. Chen, Q. Wei,
of fuzzy logic. Fuzzy Sets and Systems, 145:229–252, and P. Yan. Mining pure linguistic associations from
2004. numerical data. To appear in International Journal of
20. M. Holenˇa. Neural networks for extraction of fuzzy Approximate Reasoning.</p>
        <p>logic rules with application to EEG data. In B. Ri- 37. J. Quinlan. C4.5: Programs for Machine Learning.
beiro, R.F. Albrecht, and A. Dobnikar, editors, Adap- Morgan Kaufmann Publishers, San Francisco, 1992.
tive and Natural Computing Algorithms, pages 369– 38. A.B. Tickle, R. Andrews, M. Golea, and J. Diederich.
372. Springer Verlag, Wien, 2005. The truth will come to light: Directions and
chal21. M. Holenˇa. Piecewise-linear neural networks and their lenges in extracting rules from trained artificial
neurelationship to rule extraction from data. Neural Com- ral networks. IEEE Transactions on Neural Networks,
putation, 18:2813–2853, 2006. 9:1057–1068, 1998.
22. J.S.R. Jang. ANFIS: Adaptive-network-based fuzzy 39. H. Tsukimoto. Extracting rules from trained neural
inference system. IEEE Transactions on Systems, networks. IEEE Transactions on Neural Networks,
Man, and Cybernetics, 23:665–685, 1993. 11:333–389, 2000.
23. J.S.R. Jang and C.T. Sun. Neuro-fuzzy modeling and 40. M.J. Zaki, S. Parathasarathy, M. Ogihara, and W. Li.
control. The Proceedings of the IEEE, 83:378–406, New parallel algorithms for fast discovery of
associ1995. ation rules. Data Mining and Knowledge Discovery,
24. K.A. Kaufman and R.S. Michalski. An adjustable de- 1:343–373, 1997.</p>
        <p>scription quality measure for pattern discovery using
the AQ methodology. Journal of Intelligent
Information Systems, 14:199–216, 2000.
25. E.P. Klement, R. Mesiar, and E. Pap. Triangular</p>
        <p>Norms. Kluwer Academic Publishers, Dordrecht, 2000.
26. S. Lallich, O. Teytaud, and E. Prudhomme.
Association rule interestingness: Measure and statistical
validation. In F. Guillet and H.J. Hamilton, editors,
Quality Measures in Data Mining, pages 251–275. Springer</p>
        <p>Verlag, Berlin, 2007.
27. P. Lenca, B. Vaiilant, P. Meyer, and S. Lalich.
Association rule interestingness meaures: Experimental
and theoretical studies. In F. Guillet and H.J.
Hamilton, editors, Quality Measures in Data Mining, pages
51–76. Springer Verlag, Berlin, 2007.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>R.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mannila</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Srikant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Toivonen</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.I.</given-names>
            <surname>Verkamo</surname>
          </string-name>
          .
          <article-title>Fast discovery of association rules</article-title>
          .
          <source>In Advances in Knowledge Discovery and Data Mining</source>
          , pages
          <fpage>307</fpage>
          -
          <lpage>328</lpage>
          . AAAI Press, Menlo Park,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>S.D.</given-names>
            <surname>Bay</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.J.</given-names>
            <surname>Pazzani</surname>
          </string-name>
          . Detecting group differences.
          <source>mining contrast sets. Data Mining and Knowledge Discovery</source>
          ,
          <volume>5</volume>
          :
          <fpage>213</fpage>
          -
          <lpage>246</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>L.</given-names>
            <surname>Breiman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.H.</given-names>
            <surname>Friedman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.A.</given-names>
            <surname>Olshen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.J.</given-names>
            <surname>Stone</surname>
          </string-name>
          .
          <article-title>Classification and Regression Trees</article-title>
          . Wadsworth, Belmont,
          <year>1984</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>P.</given-names>
            <surname>Clark</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Boswell</surname>
          </string-name>
          .
          <article-title>Rule induction with CN2: Some recent improvements</article-title>
          .
          <source>In Machine Learning - EWSL-91</source>
          , pages
          <fpage>151</fpage>
          -
          <lpage>163</lpage>
          . Springer Verlag, New York,
          <year>1991</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. L. De Raedt.
          <source>Interactive Theory Revision: An Inductive Logic Programming Approach</source>
          . Academic Press, London,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>D.</given-names>
            <surname>Dubois</surname>
          </string-name>
          , Hu¨llermeier, and
          <string-name>
            <given-names>H.</given-names>
            <surname>Prade</surname>
          </string-name>
          .
          <article-title>A systematic approach to the assessment of fuzzy association rules</article-title>
          .
          <source>Data Mining and Knowledge Discovery</source>
          ,
          <volume>13</volume>
          :
          <fpage>167</fpage>
          -
          <lpage>192</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>W.</given-names>
            <surname>Duch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Adamczak</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Grabczewski</surname>
          </string-name>
          .
          <article-title>A new methodology of extraction, optimization and application of crisp and fuzzy logical rules</article-title>
          .
          <source>IEEE Transactions on Neural Networks</source>
          ,
          <volume>11</volume>
          :
          <fpage>277</fpage>
          -
          <lpage>306</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>R.A. Fisher.</surname>
          </string-name>
          <article-title>The use of multiple measurements in taxonomic problems</article-title>
          .
          <source>Annals of Eugenics</source>
          ,
          <volume>7</volume>
          :
          <fpage>179</fpage>
          -
          <lpage>188</lpage>
          ,
          <year>1936</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>A.A.</given-names>
            <surname>Freitas</surname>
          </string-name>
          .
          <article-title>Data Mining and Knowledge Discovery with Evolutionary Algorithms</article-title>
          . Springer Verlag, Berlin,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>