<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>⋆ Fuzzy classification rules based on similarity</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Martin Holenˇa</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Sˇtefka</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Nuclear Science and Physical Engineering Czech Technical University Trojanova 13</institution>
          ,
          <addr-line>120 00 Prague</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of Computer Science, Academy of Sciences of the Czech Republic Pod Vod ́arenskou vˇeˇz ́ı 2</institution>
          ,
          <addr-line>182 07 Prague</addr-line>
        </aff>
      </contrib-group>
      <fpage>25</fpage>
      <lpage>31</lpage>
      <abstract>
        <p>The paper deals with the aggregation of clas- More important is the semantic of the rules (cf. [5]), sification rules by means of fuzzy integrals, in particular especially the difference between rules of the Boolean with the fuzzy measures employed in that aggregation. It logic and rules of a fuzzy logic. Due to the semantics of points out that the kinds of fuzzy measures commonly en- Boolean and fuzzy formulas, the former are valid for countered in this context do not take into account the di- crisp sets of objects, whereas the validity of the latter versity of classification rules. As a remedy, a new kind of is a fuzzy set on the universe of all considered objects. fsuuzrzeys, maneadsusreevseriasl pursoepfuolsepdr,opcearltleieds soimfisluacrhitym-aewasaureresmaeare- Boolean rulesets are extracted more frequently, espeproven. Finally, results of extensive experiments on a num- cially some specific types of them, such as classification ber of benchmark datasets are reported, in which a particu- rulesets [6, 9]. Those are sets of implications such that lar similarity-aware measure was applied to a combination {Ar}r∈R and {Cr}r∈R partition the set O of considof Choquet or Sugeno integrals with three different ways ered objects, where {·}r∈R stands for the set of distinct of creating ensembles of classification rules. In the experi- formulas in (·)r∈R. Abandoning the requirement that ments, the new measure was compared with the traditional {Ar}r∈R partitions O (at least in the sense of a crisp Sugeno λ-measure, to which it was clearly superior. partitioning) allows to generalize those rulesets also to fuzzy antecedents [15]. For Boolean antecedents, how1 Introduction ever, this requirement entails a natural definition of the validity of a whole classification ruleset R for an Logical formulas of specific kinds, usually called rules, object x. Assuming that all information about x conare a traditional way of formally representing knowl- veyed by R is conveyed by the single rule r covering x edge. Therefore, it is not surprising that they are also (i.e., with Ar valid for x), the validity of R for x can the most frequent representation of the knowledge dis- be defined to coincide with the validity of Ar → Cr for covered in data mining. that r, which in turn equals the validity of Cr for x. The most natural base for differentiating between It is also possible to combine several existing classiexisting rules extraction methods is the syntax and fication rules into a new one. Such aggregation can be semantics of the extracted rules [10]. Syntactical dif- either static, i.e., the result is the same for all inputs, ferences between them are, however, not very deep be- or dynamic, where it is adapted to the currently classicause, principally, any rule r from a ruleset R has one fied input [11, 19]. In the aggregation of classification of the forms Sr ∼ Sr′, or Ar → Cr, where Sr, Sr′, Ar rules, we usually try to create a team of rules that and Cr are formulas of the considered logic, and ∼, → are not similar. This property is called diversity [14]. are symbols of the language of that logic. The differ- There are many methods for building a diverse team ence between both forms concerns semantic properties of classifiers [2, 3, 16]. of the symbols ∼ and →: Sr ∼ Sr′ is symmetric with One of popular aggregation operators is the fuzzy respect to Sr, Sr′ in the sense that its validity always integral [7, 12, 13, 17]. It aggregates the outputs of the coincides with that of Sr′ ∼ Sr whereas Ar → Cr is individual classification rules with respect to a fuzzy not symmetric with respect to Ar , Cr in that sense. In measure. The role of fuzzy measures in the aggregathe case of a propositional logic, ∼ and → are the con- tion of classification rules, in particular their role with nectives equivalence (≡) and implication, respectively, respect to the diversity of the rules, was the subject whereas in the case of a predicate logic, they are gener- of the research reported in this paper. alized quantifiers. To distinguish the formulas involved The following section recalls the fuzzy integrals and in the asymmetric case, Ar is called antecedent and Cr fuzzy measures encountered in the aggregation of clasconsequent of r. sification rules. In Section 3, which is the key section ⋆ The research reported in this paper has been sup- of the paper, a new fuzzy measure, called similarityported by the Czech Science Foundation (GA Cˇ R) grant aware measure, is introduced and its theoretical propP202/11/1368. erties are studied. Finally, in Section 4, results of ex-</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Fuzzy integrals and measures in
classification rules aggregation
|A| = |B| ⇒ μ(A) = μ(B)
for A, B ⊆ U ,
(
        <xref ref-type="bibr" rid="ref6">6</xref>
        )
(
        <xref ref-type="bibr" rid="ref7">7</xref>
        )
where | · | denotes the cardinality of a set.
      </p>
      <p>
        Several definitions of a fuzzy integral exists in the
literature – among them, the Choquet integral and the Consequently, the value of a symmetric measure
deSugeno integral are used most often. The role played in pends only on the cardinality of its argument. If a
symusual integration by additive measures (such as prob- metric measure is used in Choquet integral, the
inteability or Lebesgue measure) is in fuzzy integration gral reduces to the ordered weighted average
operaplayed by fuzzy measures. In this section, basic con- tor [17]. However, symmetric measures assume that
cepts pertaining to different kinds of fuzzy measures all elements of U have the same importance, thus they
will be recalled, as well as the definitions of Choquet do not take into account the diversity of elements.
and Sugeno integrals. Due to the intended context of Definition 5. Let ⊥ be a t-conorm. A fuzzy measure
aggregation of classification rules, we restrict attention μ is called ⊥-decomposable if
to [0, 1]-valued functions on finite sets.
tensive experiments and comparison with the tradi- Definition 4. A fuzzy measure μ on U is called
symtional Sugeno λ-measure are reported. metric if
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        )
(
        <xref ref-type="bibr" rid="ref4">4</xref>
        )
Definition 1. A fuzzy measure μ on a finite set U =
{u1, . . . , ur} is a function on the power set of U ,
      </p>
      <p>μ : P(U ) → [0, 1]
fulfilling:</p>
    </sec>
    <sec id="sec-2">
      <title>1. the boundary conditions</title>
    </sec>
    <sec id="sec-3">
      <title>2. the monotonicity</title>
      <p>
        μ(∅) = 0, μ(U ) = 1
A ⊆ B ⇒ μ(A) ≤ μ(B)
μ(A ∪ B) = μ(A) ⊥ μ(B)
for disjoint A, B ⊆ U (
        <xref ref-type="bibr" rid="ref8">8</xref>
        )
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) Hence, ⊥-decomposable measures need only the r fuzzy
densities, whereas all the other values are computed
using the formula (
        <xref ref-type="bibr" rid="ref8">8</xref>
        ). Particular cases of this kind of
fuzzy measures are additive measures, including
probabilistic measures (⊥ being the bounded sum), and the
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) Sugeno λ-measure.
      </p>
      <p>Definition 6. Sugeno λ-measure [7, 17] on a finite
set U = {u1, . . . , ur} is defined
(Ch)</p>
      <p>Z</p>
      <p>r
f dμ = X(f&lt;i&gt; − f&lt;i−1&gt;)μ(A&lt;i&gt;),</p>
      <p>i=1</p>
      <sec id="sec-3-1">
        <title>The values μ(u1), . . . , μ(ur) are called fuzzy densities.</title>
        <p>Definition 2. The Choquet integral of a function f : for disjoint A, B ∈ U , and some fixed λ &gt; −1. The
U → [0, 1], f (ui) = fi, i = 1, . . . , r, with respect to value of λ is:
a fuzzy measure μ is defined as:</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>a) computed as the unique non-zero root greater than −1 of the equation</title>
      <p>μ(A ∪ B) = μ(A) + μ(B) + λμ(A)μ(B),</p>
      <p>
        (
        <xref ref-type="bibr" rid="ref9">9</xref>
        )
λ + 1 =
      </p>
      <p>Y (1 + λμ({ui}))</p>
      <p>
        (
        <xref ref-type="bibr" rid="ref10">10</xref>
        )
i=1,...,r
where &lt; · &gt; indicates that the indices have been
permuted, such that 0 = f&lt;0&gt; ≤ f&lt;1&gt; ≤ · · · ≤ f&lt;r&gt; ≤ 1. if the densities do not sum up to 1;
A&lt;i&gt; = {u&lt;i&gt;, . . . , u&lt;r&gt;} denotes the set of of ele- b) λ = 0 else.
ments of U corresponding to the (r − i + 1) highest If the densities sum up to 1, the fuzzy measure is
addivalues of f . tive. Sugeno λ measure is a ⊥-decomposable measure
Definition 3. The Sugeno integral of a function f : for the t-norm
U → [0, 1], f (ui) = fi, i = 1, . . . , r, with respect to x ⊥ y = min(1, x + y + λxy). (
        <xref ref-type="bibr" rid="ref11">11</xref>
        )
a fuzzy measure μ is defined as:
(Su) Z f dμ = mrax min(f&lt;i&gt;, μ(A&lt;i&gt;)). (
        <xref ref-type="bibr" rid="ref5">5</xref>
        ) sureAissetrhioauts twheeakfunzezsys omf eaansuyre⊥-odfecao
mseptosoafbtlweom(eoari=1 more) classification rules is fully determined by the
      </p>
      <p>
        To define a general fuzzy measure in the discrete formula (
        <xref ref-type="bibr" rid="ref8">8</xref>
        ) for a fixed ⊥. Therefore, if interactions
case, we need to define all its 2r values, which is usually between elements are to be taken into account, then
very complicated. To overcome this weakness, mea- they have to be incorporated directly into the fuzzy
sures which do not need all the 2r values have been measure. That fact motivated our attempt to
elabodeveloped [7, 17]: rate the concept of similarity-aware fuzzy measures.
3
      </p>
      <sec id="sec-4-1">
        <title>Similarity-aware measures and their properties</title>
        <p>Before introducing similarity-aware measures, let us
first recall the notion of similarity [8].</p>
        <p>Definition 7. Let ∧ be a t-norm and let ∼: U × U →
[0, 1] be a fuzzy relation. ∼ is called a similarity on U
with respect to ∧ if the following holds for a, b, c ∈ U :</p>
        <p>
          ∼ (a, a) = 1 (reflexivity), (
          <xref ref-type="bibr" rid="ref12">12</xref>
          )
∼ (a, b) =∼ (b, a) (symmetry), (
          <xref ref-type="bibr" rid="ref13">13</xref>
          )
∼ (a, b)∧ ∼ (b, c) ≤∼ (a, c) (transitivity w.r.t. ∧ ).
        </p>
        <p>
          (
          <xref ref-type="bibr" rid="ref14">14</xref>
          )
        </p>
        <p>In the context of aggregation of crisp classification
rules, we will work with an empirically defined
relation, which, for rules φk, φl, is defined as the
proportion of equal consequents on some validation set of
patterns V ⊂ O,</p>
        <p>P I(Cφk (x) = Cφl (x))
∼ (φk, φl) = x∈V</p>
        <p>.</p>
        <p>
          |V |
It is easily seen that the relation (
          <xref ref-type="bibr" rid="ref15">15</xref>
          ) is a similarity
with respect to the Lukasiewicz t-norm
        </p>
        <p>∧L(a, b) = max(a + b − 1, 0),
but it is not a similarity with respect to the standard
(minimum, G¨odel) t-norm
or the product t-norm
∧S (a, b) = min(a, b),</p>
        <p>∧P (a, b) = ab.</p>
        <p>Fuzzy integral represents a convenient tool to work
with the diversity of classification rules: As we are
computing the fuzzy measure values μ(A&lt;i&gt;), we are
considering a single rule φ&lt;i&gt; at each step i, and
therefore we can influence the increase of the fuzzy measure
based on the similarity of φ&lt;i&gt; to the set of rules
already involved in the integration, i.e., A&lt;i+1&gt; =
{φ&lt;i+1&gt;, . . . , φ&lt;r&gt;}. If φ&lt;i&gt; is similar to the classifiers
in A&lt;i+1&gt;, the increase in the fuzzy measure should
be small (since the importance of the set A&lt;i&gt; should
be similar to the importance of the set A&lt;i+1&gt;), and
if φ&lt;i&gt; is not similar to the classifiers in A&lt;i+1&gt;, the
increase of the fuzzy measure should be large. These
ideas motivated the following definition:
(20)
(22)
(23)
(24)
(26)
(27)
r
S = (si,j )i,j=1 with si,j =∼ (ui, uj).</p>
        <p>
          (
          <xref ref-type="bibr" rid="ref19">19</xref>
          )
        </p>
        <p>
          The following propositions show that if for some
Definition 8. Let U = {u1, . . . , ur} be a set, let ∼ be i, the i-th classification rule is totally similar to some
a similarity w.r.t. a t-norm ∧, and let S be a an r × r other rule in A&lt;i+1&gt;, then μ(S) does not increase, and
matrix such that: if it is totally unsimilar to all classifiers in A&lt;i+1&gt;, the
increase in μ(S) is maximal.
(
          <xref ref-type="bibr" rid="ref15">15</xref>
          )
(
          <xref ref-type="bibr" rid="ref17">17</xref>
          )
(
          <xref ref-type="bibr" rid="ref18">18</xref>
          )
is called a similarity-aware measure based on S.
        </p>
        <sec id="sec-4-1-1">
          <title>Proposition 1. μ(S) is a fuzzy measure on U .</title>
          <p>
            Proof. The boundary conditions follow directly from
the definition of μ(S). For the monotonicity, let A ⊆ B;
(
            <xref ref-type="bibr" rid="ref16">16</xref>
            ) then
          </p>
          <p>r r
μ˜(S)(A) = X I(u[i] ∈ A)κ[i](1 − max s[i],[j]) ≤
i=1 j=i+1
r r
≤ X I(u[i] ∈ B)κ[i](1 − max s[i],[j]) =</p>
          <p>i=1 j=i+1
Let further κi ∈ [0, 1], i = 1, . . . , r denote some kind
of weight (confidence, importance) of ui, and let [·]
denote index ordering according to κ, such that 0 ≤
κ[1] ≤ · · · ≤ κ[r] ≤ 1. Finally, let</p>
          <p>μ˜(S) : P (U ) → [0, ∞)
be a mapping such that for X ⊆ U ,</p>
          <p>r r
μ˜(S)(X ) = X I(u[i] ∈ X )κ[i](1 − max s[i],[j]), (21)
i=1 j=i+1
where we define maxjr=r+1 s[r],[j] = 0, and I denotes
the indicator of thruth value, i.e.,</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Then the mapping</title>
      <p>I(true) = 1, I(false) = 0.
μ(S) : P (U ) → [0, 1], defined
μ(S)(X ) =
μ˜(S)(X )
μ˜(S)(U )
,
= μ˜(S)(B), (25)
due to I(u[i] ∈ A) = 1 ⇒ I(u[i] ∈ B) = 1.</p>
      <sec id="sec-5-1">
        <title>Proposition 2. For any of the 2r subsets X ⊂ U ,</title>
        <p>the value μ(X ) can be expressed simply as the sum of
values of μ on singletons
μ(S)(X ) =</p>
        <p>X μ(S)(ui).</p>
        <p>ui∈X
Proof. According to (21) and (23), the value of μ on
the singletosn ui, i = 1, . . . , r is
μ(S)(ui) = μ˜(S1)(U ) κ[i](1 − mrax s[i],[j]).</p>
        <p>
          j=i+1
Then (26) follows directly from (21).
Proposition 3. Let f : U → [0, 1], and let the ma- classification trees [3], by bagging [2] from rules
obtrix S in (
          <xref ref-type="bibr" rid="ref19">19</xref>
          ) fulfills tained with k-NN classifiers, and by the multiple
feature subset method [1] from rules obtained with
quadsi,j = 1 for i 6= j. (28) ratic discriminant analysis.
        </p>
        <p>In this section, we present results of comparing the</p>
        <p>Then: measures using 10-fold crossvalidation on 5 artificial
1. (∀X ⊆ U ) u[r] ∈ X ⇒ μ(S) = 1, and 11 real-world datasets (the properties of the
da2. (∀X ⊆ U ) u[r] 6∈ X ⇒ μ(S) = 0, tasets are shown in Table 1). For the random forests,
the number of trees was set to r = 20, the number
3. (Ch) R f dμ(S) = (Su) R f dμ(S) = f[r].
of features to explore in each node varied between 2
Proof. 1. and 2. follow directly from the fact that and 5 (depending on the dimensionality of the
particular dataset), the maximal size of a leaf was set
r (0 for i = r, to 10 (see [3] for description of the parameters). For
jm=ia+x1 s[i],[j] = 1 for i &lt; r. (29) the QDA and k-NN based ensembles, their size was
set also to r = 20, and we used k = 5 as the
numand therefore ber of neighbors for k-NN classifiers. As the weights
κ1, . . . , κr of the classification rules, we used
μ˜(S) = I(u[r] ∈ X )κ[r].</p>
        <p>(30)
We will prove 3. only for the Choquet integral, the
case of Sugeno integral is analogous. Let j ∈ {1, . . . , r}
such that &lt; j &gt;= [r]; then (∀i &gt; j) u[r] 6∈ A&lt;i&gt;, and
where V (Aφ) ⊆ V is the set of validation patterns
therefore μ(S)(A&lt;i&gt;) = 0; (∀i ≤ j) u[r] ∈ A&lt;i&gt;, and belonging to some kind of neighborhood of Aφ. For
therefore μ(S) = 1. Using this in the definition of the example, if Aφ concerns values of vectors in an
EuChoquet integral, we obtain clidean space, then V (Aφ) is the set of k nearest
neighbors under Euclidean metric of the set where the
an(Ch) Z f dμ(S) = tteoc5e,de1n0t, Aorφ2i0s,vdaelipde.nTdhinegnounmtbheer soifzeneoifghthbeordsawtaasseste.t
r Table 2 shows the results of the performed
compar= X(f&lt;i&gt; − f&lt;i−1&gt;)μ(S)(A&lt;i&gt;) = isons. We also measured the statistical significance of
i=1 the pairwse improvements (using the analysis of
varij ance on the 5% confidence level by the Tukey-Kramer
= X(f&lt;i&gt; − f&lt;i−1&gt;) = method).</p>
        <p>i=1 We interpret the results presented in Table 2 as
= f&lt;j&gt; = f[r]. (31) a confirmation of the usefulness of similarity-aware
fuzzy measures proposed in Definition 8.</p>
        <p>κi(φ) =</p>
        <p>P
x∈V(Aφ)</p>
        <p>I(Cφ′(x) = Cφ(x))
|V (Aφ)|
,
(32)</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Proposition 4. Let f : U → [0, 1], and let the ma</title>
      <p>
        trix S in (
        <xref ref-type="bibr" rid="ref19">19</xref>
        ) fulfills si,j = 0 for i 6= j. Then:
5
      </p>
      <sec id="sec-6-1">
        <title>Conclusion</title>
        <p>1. (∀X ⊆ U ) μ(S) = PiP:u[iri=]∈1Xκiκ[i] ,
2. (Ch) R f dμ(S)μ(S) = PPir=ir=11κκifii ,
In this paper, we have studied the application of the
fuzzy integral as an aggregation operator for
classification rules in the context of their similarities. We have
3. (Su) R f dμ(S) = maxrk=1(f&lt;k&gt;, PPir=irk=1κ&lt;κii&gt; ).
shown that traditionally used symmetric, or additive
Proof. 1. follows directly from the definition of simi- and other ⊥-decomposable measures are not a good
larity-aware measure, and 2. and 3. are applications choice for combining classification rules by fuzzy
inteof 1. to the definition of the Choquet/Sugeno integral. gral and we have defined similarity-aware measures,
which take into account both the confidence /
importance and the similarities of the aggregated rules.
4 Experimental testing We have shown some basic theoretical properties and
special cases of the measures, including the fact that
We have experimentally compared the performance of apart the singletons, the 2r values of μ are obtained
usthe proposed measure with the Sugeno λ-measure for ing only summation. In addition, we have
experimenthe aggregation of classification rules by fuzzy inte- tally compared the performance of the measures to the
grals (Choquet, Sugeno). The ensembles have been Sugeno λ-measure using Choquet and Sugeno fuzzy
increated as random forests from rules obtained with tegrals on 16 benchmark datasets for 3 different ways
nr. of patterns nr. of classes dimension
of obtaining ensembles of classification rules. The
experimental comparison clearly supports our
theoretical conjecture that similarity-aware measures are more
suitable for the aggregation of classification rules than
traditionally used additive and ⊥-decomposable fuzzy
measures.
dataset
clouds
concentric
gauss-3D
glass
letters
pendigits
phoneme
pima
poker
ringnorm
satimage
transfusion
vowel
waveform
wine
yeast</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. S. D. Bay:
          <article-title>Nearest neighbor classification from multiple featre subsets</article-title>
          .
          <source>Intelligent Data Analysis 3</source>
          ,
          <year>1999</year>
          ,
          <fpage>191</fpage>
          -
          <lpage>209</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. L. Breiman:
          <article-title>Bagging predictors</article-title>
          .
          <source>Machine Learning</source>
          <volume>24</volume>
          ,
          <year>1996</year>
          ,
          <fpage>123</fpage>
          -
          <lpage>140</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. L. Breiman:
          <article-title>Random forests</article-title>
          .
          <source>Machine Learning</source>
          <volume>45</volume>
          ,
          <year>2001</year>
          ,
          <fpage>5</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>4. Machine Learning Group Catholic University of Leuven. Elena database. http://mlg.info.ucl.ac.be/ index.php?page=Elena.</mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>D.</given-names>
            <surname>Dubois</surname>
          </string-name>
          , Hu¨llermeier, H. Prade:
          <article-title>A systematic approach to the assessment of fuzzy association rules</article-title>
          .
          <source>Data Mining and Knowledge Discovery</source>
          <volume>13</volume>
          ,
          <year>2006</year>
          ,
          <fpage>167</fpage>
          -
          <lpage>192</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>L.</given-names>
            <surname>Geng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. J.</given-names>
            <surname>Hamilton</surname>
          </string-name>
          <article-title>: Choosing the right lens: Finding what is interesting in data mining</article-title>
          . In F. Guillet and
          <string-name>
            <surname>H. J. Hamilton</surname>
          </string-name>
          , (Eds),
          <source>Quality Measures in Data Mining</source>
          , Springer Verlag, Berlin,
          <year>2007</year>
          ,
          <fpage>3</fpage>
          -
          <lpage>24</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>M.</given-names>
            <surname>Grabisch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. T.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. A.</given-names>
            <surname>Walker</surname>
          </string-name>
          <article-title>: Fundamentals of uncertainty calculi with applications to fuzzy inference</article-title>
          . Kluwer Academic Publishers, Dordrecht,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>P.</surname>
          </string-name>
          <article-title>H´ajek: Metamathematics of fuzzy logic</article-title>
          . Kluwer Academic Publishers, Dordrecht,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Hand</surname>
          </string-name>
          <article-title>: Construction and assessment of classification rules</article-title>
          . John Wiley and Sons, New York,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. M.
          <article-title>Holenˇa: Measures of ruleset quality capable to represent uncertain validity</article-title>
          . Submitted to International
          <source>Journal of Approximate Reasoning.</source>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>A. H. R. Ko</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Sabourin</surname>
            ,
            <given-names>A. S.</given-names>
          </string-name>
          <article-title>Britto: From dynamic classifier selection to dynamic ensemble selection</article-title>
          .
          <source>Pattern Recognition 41</source>
          ,
          <year>2008</year>
          ,
          <fpage>1718</fpage>
          -
          <lpage>1731</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>L. I.</surname>
          </string-name>
          <article-title>Kuncheva: Fuzzy versus nonfuzzy in combining classifiers designed by boosting</article-title>
          .
          <source>IEEE Transactions on Fuzzy Systems 11</source>
          ,
          <year>2003</year>
          ,
          <fpage>729</fpage>
          -
          <lpage>741</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>L. I.</surname>
          </string-name>
          <article-title>Kunchev: Combining pattern classifiers: methods and algorithms</article-title>
          . John Wiley and Sons, New York,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>L. I. Kuncheva C. J.</given-names>
            <surname>Whitaker</surname>
          </string-name>
          <article-title>: Measures of diversity in classifier ensembles</article-title>
          .
          <source>Machine Learning</source>
          <volume>51</volume>
          ,
          <year>2003</year>
          ,
          <fpage>181</fpage>
          -
          <lpage>207</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>L. E.</given-names>
            <surname>Peterson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Coleman</surname>
          </string-name>
          <article-title>: Machine learning based receiver operating characteristic (ROC) curves for crisp and fuzzy classification of DNA microarrays in cancer research</article-title>
          .
          <source>International Journal of Approximate Reasoning</source>
          <volume>47</volume>
          ,
          <year>2008</year>
          ,
          <fpage>17</fpage>
          -
          <lpage>36</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16. L. Rokach:
          <article-title>Taxonomy for characterizing ensemble methods in classification tasks: A review and annotated bibliography</article-title>
          .
          <source>Computational Statistics and Data Analysis</source>
          <volume>53</volume>
          ,
          <year>2009</year>
          ,
          <fpage>4046</fpage>
          -
          <lpage>4072</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>V.</given-names>
            <surname>Torra</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          <article-title>Narukawa: Modeling decisions: information fusion and aggregation operators</article-title>
          . Springer Verlag, Berlin,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18. Machine Learning Group University of California Irwine.
          <article-title>Repository of machine learning databases</article-title>
          . http://www.ics.uci.edu/ mlearn/ MLRepository.html.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>D. Sˇtefka</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>Holenˇa: Dynamic classifier systems and their applications to random forest ensembles</article-title>
          .
          <source>In Adaptive and Natural Computing Algorithms. Lecture Notes in Computer Science 5495</source>
          , Springer Verlag, Berlin,
          <year>2009</year>
          ,
          <fpage>458</fpage>
          -
          <lpage>468</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>