=Paper=
{{Paper
|id=None
|storemode=property
|title=Fuzzy classification rules based on similarity
|pdfUrl=https://ceur-ws.org/Vol-990/paper5.pdf
|volume=Vol-990
|dblpUrl=https://dblp.org/rec/conf/itat/HolenaS12
}}
==Fuzzy classification rules based on similarity==
Fuzzy classification rules based on similarity⋆ Martin Holeňa1 and David Štefka2 1 Institute of Computer Science, Academy of Sciences of the Czech Republic Pod Vodárenskou věžı́ 2, 182 07 Prague 2 Faculty of Nuclear Science and Physical Engineering Czech Technical University Trojanova 13, 120 00 Prague Abstract. The paper deals with the aggregation of clas- More important is the semantic of the rules (cf. [5]), sification rules by means of fuzzy integrals, in particular especially the difference between rules of the Boolean with the fuzzy measures employed in that aggregation. It logic and rules of a fuzzy logic. Due to the semantics of points out that the kinds of fuzzy measures commonly en- Boolean and fuzzy formulas, the former are valid for countered in this context do not take into account the di- crisp sets of objects, whereas the validity of the latter versity of classification rules. As a remedy, a new kind of is a fuzzy set on the universe of all considered objects. fuzzy measures is proposed, called similarity-aware mea- sures, and several useful properties of such measures are Boolean rulesets are extracted more frequently, espe- proven. Finally, results of extensive experiments on a num- cially some specific types of them, such as classification ber of benchmark datasets are reported, in which a particu- rulesets [6, 9]. Those are sets of implications such that lar similarity-aware measure was applied to a combination {Ar }r∈R and {Cr }r∈R partition the set O of consid- of Choquet or Sugeno integrals with three different ways ered objects, where {·}r∈R stands for the set of distinct of creating ensembles of classification rules. In the experi- formulas in (·)r∈R . Abandoning the requirement that ments, the new measure was compared with the traditional {Ar }r∈R partitions O (at least in the sense of a crisp Sugeno λ-measure, to which it was clearly superior. partitioning) allows to generalize those rulesets also to fuzzy antecedents [15]. For Boolean antecedents, how- 1 Introduction ever, this requirement entails a natural definition of the validity of a whole classification ruleset R for an Logical formulas of specific kinds, usually called rules, object x. Assuming that all information about x con- are a traditional way of formally representing knowl- veyed by R is conveyed by the single rule r covering x edge. Therefore, it is not surprising that they are also (i.e., with Ar valid for x), the validity of R for x can the most frequent representation of the knowledge dis- be defined to coincide with the validity of Ar → Cr for covered in data mining. that r, which in turn equals the validity of Cr for x. The most natural base for differentiating between It is also possible to combine several existing classi- existing rules extraction methods is the syntax and fication rules into a new one. Such aggregation can be semantics of the extracted rules [10]. Syntactical dif- either static, i.e., the result is the same for all inputs, ferences between them are, however, not very deep be- or dynamic, where it is adapted to the currently classi- cause, principally, any rule r from a ruleset R has one fied input [11, 19]. In the aggregation of classification of the forms Sr ∼ Sr′ , or Ar → Cr , where Sr , Sr′ , Ar rules, we usually try to create a team of rules that and Cr are formulas of the considered logic, and ∼, → are not similar. This property is called diversity [14]. are symbols of the language of that logic. The differ- There are many methods for building a diverse team ence between both forms concerns semantic properties of classifiers [2, 3, 16]. of the symbols ∼ and →: Sr ∼ Sr′ is symmetric with One of popular aggregation operators is the fuzzy respect to Sr , Sr′ in the sense that its validity always integral [7, 12, 13, 17]. It aggregates the outputs of the coincides with that of Sr′ ∼ Sr whereas Ar → Cr is individual classification rules with respect to a fuzzy not symmetric with respect to Ar , Cr in that sense. In measure. The role of fuzzy measures in the aggrega- the case of a propositional logic, ∼ and → are the con- tion of classification rules, in particular their role with nectives equivalence (≡) and implication, respectively, respect to the diversity of the rules, was the subject whereas in the case of a predicate logic, they are gener- of the research reported in this paper. alized quantifiers. To distinguish the formulas involved The following section recalls the fuzzy integrals and in the asymmetric case, Ar is called antecedent and Cr fuzzy measures encountered in the aggregation of clas- consequent of r. sification rules. In Section 3, which is the key section ⋆ The research reported in this paper has been sup- of the paper, a new fuzzy measure, called similarity- ported by the Czech Science Foundation (GA ČR) grant aware measure, is introduced and its theoretical prop- P202/11/1368. erties are studied. Finally, in Section 4, results of ex- 26 Martin Holeňa, David Štefka tensive experiments and comparison with the tradi- Definition 4. A fuzzy measure µ on U is called sym- tional Sugeno λ-measure are reported. metric if |A| = |B| ⇒ µ(A) = µ(B) (6) 2 Fuzzy integrals and measures in for A, B ⊆ U, (7) classification rules aggregation where | · | denotes the cardinality of a set. Several definitions of a fuzzy integral exists in the lit- erature – among them, the Choquet integral and the Consequently, the value of a symmetric measure de- Sugeno integral are used most often. The role played in pends only on the cardinality of its argument. If a sym- usual integration by additive measures (such as prob- metric measure is used in Choquet integral, the inte- ability or Lebesgue measure) is in fuzzy integration gral reduces to the ordered weighted average opera- played by fuzzy measures. In this section, basic con- tor [17]. However, symmetric measures assume that cepts pertaining to different kinds of fuzzy measures all elements of U have the same importance, thus they will be recalled, as well as the definitions of Choquet do not take into account the diversity of elements. and Sugeno integrals. Due to the intended context of Definition 5. Let ⊥ be a t-conorm. A fuzzy measure aggregation of classification rules, we restrict attention µ is called ⊥-decomposable if to [0, 1]-valued functions on finite sets. Definition 1. A fuzzy measure µ on a finite set U = µ(A ∪ B) = µ(A) ⊥ µ(B) {u1 , . . . , ur } is a function on the power set of U, for disjoint A, B ⊆ U (8) µ : P(U) → [0, 1] (1) Hence, ⊥-decomposable measures need only the r fuzzy densities, whereas all the other values are computed fulfilling: using the formula (8). Particular cases of this kind of 1. the boundary conditions fuzzy measures are additive measures, including prob- abilistic measures (⊥ being the bounded sum), and the µ(∅) = 0, µ(U) = 1 (2) Sugeno λ-measure. 2. the monotonicity Definition 6. Sugeno λ-measure [7, 17] on a finite set U = {u1 , . . . , ur } is defined A ⊆ B ⇒ µ(A) ≤ µ(B) (3) µ(A ∪ B) = µ(A) + µ(B) + λµ(A)µ(B), (9) The values µ(u1 ), . . . , µ(ur ) are called fuzzy densities. for disjoint A, B ∈ U, and some fixed λ > −1. The Definition 2. The Choquet integral of a function f : value of λ is: U → [0, 1], f (ui ) = fi , i = 1, . . . , r, with respect to a fuzzy measure µ is defined as: a) computed as the unique non-zero root greater Z r than −1 of the equation X (Ch) f dµ = (f − f )µ(A ), (4) Y i=1 λ+1= (1 + λµ({ui })) (10) i=1,...,r where < · > indicates that the indices have been per- muted, such that 0 = f<0> ≤ f<1> ≤ · · · ≤ f≤ 1. if the densities do not sum up to 1; A = {u , . . . , u } denotes the set of of ele- b) λ = 0 else. ments of U corresponding to the (r − i + 1) highest If the densities sum up to 1, the fuzzy measure is addi- values of f . tive. Sugeno λ measure is a ⊥-decomposable measure Definition 3. The Sugeno integral of a function f : for the t-norm U → [0, 1], f (ui ) = fi , i = 1, . . . , r, with respect to x ⊥ y = min(1, x + y + λxy). (11) a fuzzy measure µ is defined as: Z A serious weakness of any ⊥-decomposable mea- r (Su) f dµ = max min(f , µ(A )). (5) sure is that the fuzzy measure of a set of two (or i=1 more) classification rules is fully determined by the To define a general fuzzy measure in the discrete formula (8) for a fixed ⊥. Therefore, if interactions case, we need to define all its 2r values, which is usually between elements are to be taken into account, then very complicated. To overcome this weakness, mea- they have to be incorporated directly into the fuzzy sures which do not need all the 2r values have been measure. That fact motivated our attempt to elabo- developed [7, 17]: rate the concept of similarity-aware fuzzy measures. Fuzzy classification rules based on similarity 27 3 Similarity-aware measures and Let further κi ∈ [0, 1], i = 1, . . . , r denote some kind their properties of weight (confidence, importance) of ui , and let [·] denote index ordering according to κ, such that 0 ≤ Before introducing similarity-aware measures, let us κ[1] ≤ · · · ≤ κ[r] ≤ 1. Finally, let first recall the notion of similarity [8]. µ̃(S) : P(U) → [0, ∞) (20) Definition 7. Let ∧ be a t-norm and let ∼: U × U → be a mapping such that for X ⊆ U, [0, 1] be a fuzzy relation. ∼ is called a similarity on U r with respect to ∧ if the following holds for a, b, c ∈ U: X r µ̃(S) (X) = I(u[i] ∈ X)κ[i] (1 − max s[i],[j] ), (21) j=i+1 ∼ (a, a) = 1 (reflexivity), (12) i=1 ∼ (a, b) =∼ (b, a) (symmetry), (13) where we define maxrj=r+1 s[r],[j] = 0, and I denotes ∼ (a, b)∧ ∼ (b, c) ≤∼ (a, c) (transitivity w.r.t. ∧ ). the indicator of thruth value, i.e., (14) I(true) = 1, I(false) = 0. (22) In the context of aggregation of crisp classification Then the mapping rules, we will work with an empirically defined rela- µ(S) : P(U) → [0, 1], defined (23) tion, which, for rules φk , φl , is defined as the propor- (S) tion of equal consequents on some validation set of µ̃ (X) µ(S) (X) = (S) , (24) patterns V ⊂ O, µ̃ (U) is called a similarity-aware measure based on S. P I(Cφk (x) = Cφl (x)) x∈V ∼ (φk , φl ) = . (15) |V| Proposition 1. µ(S) is a fuzzy measure on U. It is easily seen that the relation (15) is a similarity with respect to the Lukasiewicz t-norm Proof. The boundary conditions follow directly from the definition of µ(S) . For the monotonicity, let A ⊆ B; ∧L (a, b) = max(a + b − 1, 0), (16) then r but it is not a similarity with respect to the standard X r (minimum, Gödel) t-norm µ̃(S) (A) = I(u[i] ∈ A)κ[i] (1 − max s[i],[j] ) ≤ j=i+1 i=1 r ∧S (a, b) = min(a, b), (17) X r ≤ I(u[i] ∈ B)κ[i] (1 − max s[i],[j] ) = j=i+1 or the product t-norm i=1 = µ̃(S) (B), (25) ∧P (a, b) = ab. (18) due to I(u[i] ∈ A) = 1 ⇒ I(u[i] ∈ B) = 1. Fuzzy integral represents a convenient tool to work with the diversity of classification rules: As we are Proposition 2. For any of the 2r subsets X ⊂ U, the value µ(X) can be expressed simply as the sum of computing the fuzzy measure values µ(A ), we are considering a single rule φ at each step i, and there- values of µ on singletons fore we can influence the increase of the fuzzy measure X µ(S) (X) = µ(S) (ui ). (26) based on the similarity of φ to the set of rules ui ∈X already involved in the integration, i.e., A = {φ , . . . , φ }. If φ is similar to the classifiers Proof. According to (21) and (23), the value of µ on in A , the increase in the fuzzy measure should the singletosn ui , i = 1, . . . , r is be small (since the importance of the set A should 1 r µ(S) (ui ) = κ[i] (1 − max s[i],[j] ). (27) be similar to the importance of the set A ), and µ̃(S) (U) j=i+1 if φ is not similar to the classifiers in A , the Then (26) follows directly from (21). increase of the fuzzy measure should be large. These ideas motivated the following definition: The following propositions show that if for some Definition 8. Let U = {u1 , . . . , ur } be a set, let ∼ be i, the i-th classification rule is totally similar to some (S) a similarity w.r.t. a t-norm ∧, and let S be a an r × r other rule in A , then µ does not increase, and matrix such that: if it is totally unsimilar to all classifiers in A , the increase in µ(S) is maximal. S = (si,j )ri,j=1 with si,j =∼ (ui , uj ). (19) 28 Martin Holeňa, David Štefka Proposition 3. Let f : U → [0, 1], and let the ma- classification trees [3], by bagging [2] from rules ob- trix S in (19) fulfills tained with k-NN classifiers, and by the multiple fea- ture subset method [1] from rules obtained with quad- si,j = 1 for i 6= j. (28) ratic discriminant analysis. In this section, we present results of comparing the Then: measures using 10-fold crossvalidation on 5 artificial 1. (∀X ⊆ U) u[r] ∈ X ⇒ µ(S) = 1, and 11 real-world datasets (the properties of the da- 2. (∀X ⊆ U) u[r] 6∈ X ⇒ µ(S) = 0, tasets are shown in Table 1). For the random forests, the number of trees was set to r = 20, the number 3. (Ch) f dµ(S) = (Su) f dµ(S) = f[r] . R R of features to explore in each node varied between 2 Proof. 1. and 2. follow directly from the fact that and 5 (depending on the dimensionality of the par- ( ticular dataset), the maximal size of a leaf was set r 0 for i = r, to 10 (see [3] for description of the parameters). For max s[i],[j] = (29) the QDA and k-NN based ensembles, their size was j=i+1 1 for i < r. set also to r = 20, and we used k = 5 as the num- and therefore ber of neighbors for k-NN classifiers. As the weights κ1 , . . . , κr of the classification rules, we used (S) µ̃ = I(u[r] ∈ X)κ[r]. (30) I(Cφ′ (x) = Cφ (x)) P x∈V(Aφ ) We will prove 3. only for the Choquet integral, the κi (φ) = , (32) |V(Aφ )| case of Sugeno integral is analogous. Let j ∈ {1, . . . , r} such that < j >= [r]; then (∀i > j) u[r] 6∈ A , and where V(Aφ ) ⊆ V is the set of validation patterns therefore µ(S) (A ) = 0; (∀i ≤ j) u[r] ∈ A , and belonging to some kind of neighborhood of Aφ . For therefore µ(S) = 1. Using this in the definition of the example, if Aφ concerns values of vectors in an Eu- Choquet integral, we obtain clidean space, then V(Aφ ) is the set of k nearest neigh- bors under Euclidean metric of the set where the an- tecedent Aφ is valid. The number of neighbors was set Z (Ch) f dµ(S) = to 5, 10, or 20, depending on the size of the dataset. Xr Table 2 shows the results of the performed compar- = (f − f )µ(S) (A ) = isons. We also measured the statistical significance of i=1 the pairwse improvements (using the analysis of vari- X j ance on the 5% confidence level by the Tukey-Kramer = (f − f ) = method). i=1 We interpret the results presented in Table 2 as = f = f[r] . (31) a confirmation of the usefulness of similarity-aware fuzzy measures proposed in Definition 8. Proposition 4. Let f : U → [0, 1], and let the ma- trix S in (19) fulfills si,j = 0 for i 6= j. Then: P 5 Conclusion i:u[i] ∈X κ[i] (S) Pr 1. (∀X ⊆ U) µ = , In this paper, we have studied the application of the Pi=1 κi r i=1 κi fi 2. (Ch) f dµ(S) µ(S) = R P , r fuzzy integral as an aggregation operator for classifica- i=1 κi Pr i=k κ tion rules in the context of their similarities. We have 3. (Su) f dµ(S) = maxrk=1 (f , P R r κi ). i=1 shown that traditionally used symmetric, or additive Proof. 1. follows directly from the definition of simi- and other ⊥-decomposable measures are not a good larity-aware measure, and 2. and 3. are applications choice for combining classification rules by fuzzy inte- of 1. to the definition of the Choquet/Sugeno integral. gral and we have defined similarity-aware measures, which take into account both the confidence / im- portance and the similarities of the aggregated rules. 4 Experimental testing We have shown some basic theoretical properties and special cases of the measures, including the fact that We have experimentally compared the performance of apart the singletons, the 2r values of µ are obtained us- the proposed measure with the Sugeno λ-measure for ing only summation. In addition, we have experimen- the aggregation of classification rules by fuzzy inte- tally compared the performance of the measures to the grals (Choquet, Sugeno). The ensembles have been Sugeno λ-measure using Choquet and Sugeno fuzzy in- created as random forests from rules obtained with tegrals on 16 benchmark datasets for 3 different ways Fuzzy classification rules based on similarity 29 dataset nr. of patterns nr. of classes dimension 6. L. Geng, H. J. Hamilton: Choosing the right lens: Find- ing what is interesting in data mining. In F. Guillet Artificial and H. J. Hamilton, (Eds), Quality Measures in Data Mining, Springer Verlag, Berlin, 2007, 3–24. clouds [4] 5000 2 2 7. M. Grabisch, H. T. Nguyen, E. A. Walker: Fundamen- concentric [4] 2500 2 2 tals of uncertainty calculi with applications to fuzzy inference. Kluwer Academic Publishers, Dordrecht, gauss 3D [4] 5000 2 3 1994. ringnorm [18] 3000 2 20 8. P. Hájek: Metamathematics of fuzzy logic. Kluwer Academic Publishers, Dordrecht, 1998. waveform [18] 5000 3 21 9. D. J. Hand: Construction and assessment of classifi- Real-world cation rules. John Wiley and Sons, New York, 1997. 10. M. Holeňa: Measures of ruleset quality capable to rep- glass [18] 214 7 9 resent uncertain validity. Submitted to International letters [18] 20000 26 16 Journal of Approximate Reasoning. 11. A. H. R. Ko, R. Sabourin, A. S. Britto: From dynamic pendigits [18] 10992 10 16 classifier selection to dynamic ensemble selection. Pat- tern Recognition 41, 2008, 1718–1731. phoneme [4] 5427 2 5 12. L. I. Kuncheva: Fuzzy versus nonfuzzy in combining pima [18] 768 2 8 classifiers designed by boosting. IEEE Transactions on Fuzzy Systems 11, 2003, 729–741. poker [18] 4828 3 10 13. L. I. Kunchev: Combining pattern classifiers: methods satimage [4] 6435 6 4 and algorithms. John Wiley and Sons, New York, 2004. 14. L. I. Kuncheva C. J. Whitaker: Measures of diversity transfusion [18] 748 2 4 in classifier ensembles. Machine Learning 51, 2003, vowel [18] 990 11 10 181–207. 15. L. E. Peterson, M. A. Coleman: Machine learning wine [18] 178 3 13 based receiver operating characteristic (ROC) curves yeast [18] 1484 4 8 for crisp and fuzzy classification of DNA microarrays in cancer research. International Journal of Approxi- Table 1. Datasets used in the experiments. mate Reasoning 47, 2008, 17–36. 16. L. Rokach: Taxonomy for characterizing ensemble methods in classification tasks: A review and annotated bibliography. Computational Statistics and Data Anal- of obtaining ensembles of classification rules. The ex- ysis 53, 2009, 4046–4072. 17. V. Torra, Y. Narukawa: Modeling decisions: informa- perimental comparison clearly supports our theoreti- tion fusion and aggregation operators. Springer Verlag, cal conjecture that similarity-aware measures are more Berlin, 2007. suitable for the aggregation of classification rules than 18. Machine Learning Group University of Califor- traditionally used additive and ⊥-decomposable fuzzy nia Irwine. Repository of machine learning measures. databases. http://www.ics.uci.edu/ mlearn/ MLRepository.html. 19. D. Štefka, M. Holeňa: Dynamic classifier systems and their applications to random forest ensembles. In References Adaptive and Natural Computing Algorithms. Lec- ture Notes in Computer Science 5495, Springer Verlag, 1. S. D. Bay: Nearest neighbor classification from multiple Berlin, 2009, 458–468. featre subsets. Intelligent Data Analysis 3, 1999, 191– 209. 2. L. Breiman: Bagging predictors. Machine Learning 24, 1996, 123–140. 3. L. Breiman: Random forests. Machine Learning 45, 2001, 5–32. 4. Machine Learning Group Catholic University of Leu- ven. Elena database. http://mlg.info.ucl.ac.be/ index.php?page=Elena. 5. D. Dubois, Hüllermeier, H. Prade: A systematic ap- proach to the assessment of fuzzy association rules. Data Mining and Knowledge Discovery 13, 2006, 167– 192. 30 Martin Holeňa, David Štefka dataset Choquet integral Sugeno integral S λ-measure µ λ-measure µS random forests clouds 12.40 ± 1.81 12.25 ± 1.85 12.80 ± 1.64 12.33 ± 1.47 concentric 4.32 ± 1.25 2.82 ± 1.30 3.24 ± 1.37 2.98 ± 1.50 gauss-3D 23.92 ± 2.97 22.76 ± 1.59 24.60 ± 1.36 23.28 ± 1.58 glass 21.3 ± 10.3 14.1 ± 3.5 24.1 ± 7.0 17.5 ± 9.4 letters 7.1 ± 0.6 7.3 ± 0.2 8.0 ± 0.6 7.9 ± 0.8 pendigits 3.1 ± 0.5 2.7 ± 0.5 3.2 ± 0.4 3.8 ± 0.7 phoneme 12.4 ± 1.2 13.2 ± 1.9 12.7 ± 0.8 13.3 ± 1.6 pima 26.0 ± 4.8 23.8 ± 2.0 25.0 ± 2.2 23.9 ± 3.6 poker 46.5 ± 3.0 44.4 ± 1.3 46.5 ± 1.5 45.1 ± 1.9 ringnorm 13.27 ± 2.11 7.69 ± 2.06 12.74 ± 2.08 7.46 ± 1.79 satimage 14.7 ± 1.4 14.3 ± 1.3 14.9 ± 0.9 14.8 ± 1.4 transfusion 4.8 ± 1.1 2.3 ± 0.7 4.9 ± 1.0 2.6 ± 0.7 vowel 14.5 ± 3.0 13.1 ± 3.5 17.0 ± 5.3 13.4 ± 3.8 waveform 18.56 ± 2.42 17.93 ± 1.89 18.24 ± 3.04 18.23 ± 1.58 wine 5.6 ± 6.0 3.3 ± 5.5 3.4 ± 4.0 6.6 ± 5.8 yeast 38.2 ± 4.1 34.8 ± 2.6 38.5 ± 3.7 36.3 ± 3.4 k-NN classifiers clouds 11.93 ± 2.29 12.12 ± 1.57 12.64 ± 2.48 12.96 ± 2.26 concentric 1.39 ± 0.77 1.72 ± 0.57 1.30 ± 0.80 1.56 ± 0.64 gauss-3D 26.71 ± 2.55 26.00 ± 2.88 27.68 ± 3.66 26.28 ± 2.74 glass 22.4 ± 9.8 20.7 ± 10.3 21.7 ± 11.1 19.3 ± 6.5 letters 17.7 ± 2.7 17.6 ± 2.9 19.3 ± 3.1 19.1 ± 2.7 pendigits 1.3 ± 0.8 1.4 ± 0.8 1.3 ± 0.5 1.3 ± 0.7 phoneme 14.6 ± 0.9 14.2 ± 2.4 14.4 ± 1.8 14.5 ± 1.7 pima 29.1 ± 5.1 30.2 ± 7.2 29.5 ± 4.4 30.3 ± 6.6 poker 45.3 ± 2.4 43.5 ± 2.3 47.2 ± 2.7 43.9 ± 1.4 ringnorm 36.20 ± 4.41 34.28 ± 2.59 33.56 ± 3.34 33.48 ± 2.94 satimage 16.5 ± 2.0 15.5 ± 1.7 16.8 ± 2.4 16.2 ± 2.3 transfusion 24.0 ± 4.0 23.4 ± 4.7 25.2 ± 3.5 24.0 ± 4.7 vowel 4.8 ± 2.2 4.0 ± 1.9 5.6 ± 2.1 7.0 ± 1.8 waveform 19.40 ± 2.10 18.28 ± 2.85 19.57 ± 2.20 19.04 ± 2.99 wine 30.0 ± 10.3 28.6 ± 14.8 31.2 ± 6.7 33.4 ± 16.0 yeast 41.8 ± 4.7 40.5 ± 2.6 42.6 ± 3.7 40.7 ± 3.6 Fuzzy classification rules based on similarity 31 QDA with multiple subsets clouds 26.00 ± 2.70 23.14 ± 2.49 25.74 ± 1.92 22.66 ± 1.10 concentric 4.36 ± 1.96 3.68 ± 1.68 5.72 ± 1.84 3.40 ± 0.98 gauss-3D 23.87 ± 1.86 22.06 ± 2.10 23.96 ± 2.03 22.36 ± 2.12 glass 42.3 ± 10.9 38.5 ± 12.0 43.2 ± 14.9 32.4 ± 12.5 letters 17.1 ± 0.7 14.7 ± 0.7 17.2 ± 0.7 14.7 ± 0.8 pendigits 2.8 ± 0.5 2.2 ± 0.2 2.7 ± 0.5 2.7 ± 0.6 phoneme 25.4 ± 2.4 20.8 ± 1.4 24.7 ± 1.0 20.2 ± 2.2 pima 27.9 ± 4.7 25.5 ± 4.2 28.3 ± 3.3 26.1 ± 5.1 poker 66.1 ± 2.1 55.1 ± 2.3 66.3 ± 3.9 55.1 ± 2.1 ringnorm 1.94 ± 0.96 2.53 ± 1.01 1.68 ± 0.60 3.66 ± 1.31 satimage 17.0 ± 1.3 15.7 ± 1.1 17.2 ± 2.0 16.2 ± 1.3 transfusion 29.6 ± 8.6 22.3 ± 4.5 29.2 ± 7.1 23.4 ± 3.4 vowel 16.7 ± 5.4 14.0 ± 3.8 18.1 ± 4.1 15.6 ± 3.3 waveform 15.73 ± 2.07 14.52 ± 1.59 15.33 ± 1.72 14.54 ± 1.80 wine 1.2 ± 2.5 2.8 ± 3.9 0.6 ± 1.9 3.3 ± 2.9 yeast 49.0 ± 4.3 39.8 ± 4.3 49.5 ± 4.9 39.1 ± 3.8 Table 2. Mean error rates ± standard deviation of the error rate [%], based on 10-fold crossvalidation. The best result for each dataset is displayed in boldface, statistically significant improvements (measured by the analysis of variance using the Tukey-Kramer method at the 5% level) are displayed in italics