=Paper=
{{Paper
|id=Vol-2571/CSP2019_paper_1
|storemode=property
|title=Decision Trees for Knowledge Representation (short paper)
|pdfUrl=https://ceur-ws.org/Vol-2571/CSP2019_paper_1.pdf
|volume=Vol-2571
|authors=Mohammad Azad,Igor Chikalov,Mikhail Moshkov
|dblpUrl=https://dblp.org/rec/conf/csp/AzadCM19
}}
==Decision Trees for Knowledge Representation (short paper)==
Decision Trees for Knowledge Representation? Mohammad Azad1 , Igor Chikalov2 , and Mikhail Moshkov2 1 Jouf University College of Computer and Information Sciences Sakaka 72388, Saudi Arabia mmazad@ju.edu.sa 2 King Abdullah University of Science and Technology (KAUST) Computer, Electrical and Mathematical Sciences & Engineering Division Thuwal 23955-6900, Saudi Arabia igor.chikalov@gmail.com, mikhail.moshkov@kaust.edu.sa Abstract. In this paper, we consider decision trees as a means of knowl- edge representation. To this end, we design three algorithms for decision tree construction that are based on extensions of dynamic programming. We study three parameters of the decision trees constructed by these algorithms: number of nodes, global misclassification rate, and local mis- classification rate. Keywords: knowledge representation, decision trees, extensions of dy- namic programming 1 Introduction Decision trees are widely used as classifiers, as a means of knowledge represen- tation, and as algorithms [3, 5, 8]. In this paper, we consider decision trees as a means of knowledge representation. Let T be a decision table and Γ be a decision tree for the table T [1]. We study three parameters of the tree Γ : – N (Γ ) – the number of nodes in Γ . – G(T, Γ ) – the global misclassification rate which is equal to the number of misclassifications of Γ on T divided by the number of rows in T . – L(T, Γ ) – the local misclassification rate. For each terminal node v of Γ , we find the number of misclassifications of Γ on rows of T for which the computation finishes in v divided by the number of rows of T for which the computation finishes in v, and consider maximum of this parameter among all terminal nodes of Γ . It is easy to show that G(T, Γ ) ≤ L(T, Γ ). To be understandable, the decision tree Γ should have a reasonable number of nodes. To represent properly knowledge from the decision table T , the de- cision tree Γ should have a reasonable accuracy. The consideration of only the ? Copyright c 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 1 global misclassification rate may be insufficient: the misclassifications may be unevenly distributed and, for some terminal nodes, the fraction of misclassifica- tions can be high. To deal with this situation, we should consider also the local misclassification rate. In this paper, we design three algorithms for decision tree construction which are applicable to medium-sized decision tables with only categorical attributes. These algorithms are based on extensions of dynamic programming – bi-criteria optimization of decision trees relative to the parameters N and G, and relative to the parameters N and L [1]. One of the algorithms (GL-algorithm) is completely new. We apply the considered algorithms to 14 decision tables from the UCI Machine Learning Repository [4], and study three parameters N , G, and L of the constructed trees. The obtained results show that at least one of the considered algorithms (GL-algorithm) can be useful for the extraction of knowledge from medium- sized decision tables and for its representation by decision trees. This algorithm can be used in different areas of data analysis including rough set theory [6, 7]. The rest of the paper is organized as follows. In Sect. 2, we describe three algorithms for decision tree construction. In Sect. 3, we discuss results of exper- iments with decision tables from the UCI ML Repository [4]. Section 4 contains short conclusions. 2 Three Algorithms for Decision Tree Construction In the book [1], we described an algorithm A7 which, for a given decision table, constructs the set of Pareto optimal points (POPs) for the problem of bi-criteria optimization of decision trees relative to the parameters N and G (see, for ex- ample, Fig. 1 (a), (c), (e)). The same algorithm A7 can also construct the set of POPs for the problem of bi-criteria optimization of decision trees relative to the parameters N and L (see, for example, Fig. 1 (b), (d), (f)). For each POP, we can derive a decision tree with values of the considered parameters equal to the coordinates of this point. We now describe three algorithms for decision tree construction based on the use of the algorithm A7 . 2.1 G-Algorithm For a given decision table T , we construct using the algorithm A7 the set of POPs for the parameters N and G. We normalize coordinates of POPs: for each POP, divide each coordinate by the maximum value of this coordinate among all POPs. After that, we choose a normalized POP with the minimum Euclidean distance from the origin. We restore coordinates of this point and derive a decision tree Γ , for which the values of the parameters N and G are equal to the restored coordinates. The tree Γ is the output of G-algorithm. 2 2.2 L-Algorithm L-algorithm works in the same way as G-algorithm but instead of the parameters N and G it uses the parameters N and L. 2.3 GL-Algorithm We apply G-algorithm to a given decision table T and construct a decision tree Γ1 . After that, using the algorithm A7 we construct the set of POPs for the parameters N and L, and choose a POP for which the value of the coordinate N is closest to N (Γ1 ). At the end, we derive a decision tree Γ2 for which the values of the parameters N and L are equal to the coordinates of the chosen POP. The tree Γ2 is the output of GL-algorithm. 3 Results of Experiments We made experiments with 14 decision tables from the UCI ML Repository [4] described in Table 1. Table 1. Decision tables used in experiments Decision Number of Number of table rows attributes balance-scale 625 5 breast-cancer 266 10 cars 1728 7 hayes-roth-data 69 5 house-votes-84 279 17 iris 150 5 lenses 10 5 lymphography 148 19 nursery 12960 9 shuttle-landing 15 7 soybean-small 47 36 spect-test 169 23 tic-tac-toe 958 10 zoo-data 59 17 We applied G-algorithm, L-algorithm, and GL-algorithm to each of these tables and found values of the parameters N , G, and L for the constructed decision trees. Results of experiments can be found in Table 2. Decision trees constructed by G-algorithm have overall reasonable values of the parameters N and G but often have high values of the parameter L. Decision trees constructed by L-algorithm have overall reasonable values of the parameters G and L but sometimes have high values of the parameter N . 3 0.3 0.3 0.2 0.2 G L 0.1 0.1 0 0 0 50 100 150 0 50 100 150 N N (a) breast-cancer, N and G (b) breast-cancer, N and L 0.6 0.6 0.4 0.4 G L 0.2 0.2 0 0 0 200 400 600 800 1,000 0 200 400 600 800 1,000 N N (c) nursery, N and G (d) nursery, N and L 0.3 0.3 0.2 0.2 G 1 L 1 0.1 0.1 0 0 0 50 100 150 200 250 0 50 100 150 200 250 N N (e) tic-tac-toe, N and G (f) tic-tac-toe, N and L Fig. 1. Sets of Pareto optimal points for decision tables breast-cancer, nursery, and tic-tac-toe for pairs of parameters N , G and N , L Decision trees constructed by GL-algorithm have overall reasonable values of the parameters 1N , G, and L. We can use GL-algorithm 1 to construct enough understandable and accurate decision trees. 4 Table 2. Results of experiments Decision G-algorithm L-algorithm GL-algorithm table N G L N G L N G L balance-scale 106 0.18 0.40 186 0.14 0.24 91 0.22 0.32 breast-cancer 59 0.11 0.33 58 0.13 0.16 58 0.13 0.16 cars 98 0.08 0.50 338 0.01 0.08 135 0.14 0.29 hayes-roth-data 23 0.16 0.50 26 0.13 0.33 26 0.13 0.33 house-votes-84 3 0.06 0.12 11 0.03 0.06 3 0.06 0.13 iris 5 0.04 0.09 5 0.04 0.09 5 0.04 0.09 lenses 6 0.10 0.50 8 0.00 0.00 8 0.00 0.00 lymphography 13 0.13 0.33 11 0.16 0.20 11 0.16 0.20 nursery 74 0.08 0.34 115 0.09 0.22 70 0.10 0.23 shuttle-landing 7 0.20 0.33 5 0.27 0.31 5 0.27 0.31 soybean-small 3 0.42 0.55 3 0.43 0.50 3 0.43 0.50 spect-test 17 0.02 0.10 19 0.02 0.02 17 0.02 0.03 tic-tac-toe 72 0.11 0.46 82 0.12 0.19 72 0.15 0.20 zoo-data 8 0.19 0.43 9 0.20 0.28 9 0.20 0.28 Average 35.29 0.13 0.36 62.57 0.13 0.19 36.64 0.15 0.22 4 Conclusions We proposed to evaluate the accuracy of decision trees not only by the global mis- classification rate G but also by the local misclassification rate L, and designed GL-algorithm which constructs decision trees with mostly reasonable number of nodes and mostly reasonable values of the parameters G and L. Later we are planning to extend the considered technique to the case of decision tables with many-valued decisions using bi-criteria optimization algorithms described in [2]. We are planning also to extend this technique to the case of decision tables with both categorical and numerical attributes. Acknowledgments Research reported in this publication was supported by King Abdullah Univer- sity of Science and Technology (KAUST). The authors are greatly indebted to the anonymous reviewers for useful comments. References 1. AbouEisha, H., Amin, T., Chikalov, I., Hussain, S., Moshkov, M.: Extensions of Dy- namic Programming for Combinatorial Optimization and Data Mining, Intelligent Systems Reference Library, vol. 146. Springer (2019) 2. Alsolami, F., Azad, M., Chikalov, I., Moshkov, M.: Decision and Inhibitory Trees and Rules for Decision Tables with Many-valued Decisions, Intelligent Systems Reference Library, vol. 156. Springer (2020) 5 3. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regres- sion Trees. Wadsworth and Brooks, Monterey, CA (1984) 4. Lichman, M.: UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences (2013), http://archive.ics.uci.edu/ml 5. Moshkov, M.: Time complexity of decision trees. In: Peters, J.F., Skowron, A. (eds.) Trans. Rough Sets III, Lecture Notes in Computer Science, vol. 3400, pp. 244–459. Springer (2005) 6. Pawlak, Z.: Rough Sets – Theoretical Aspect of Reasoning About Data. Kluwer Academic Publishers, Dordrecht (1991) 7. Pawlak, Z., Skowron, A.: Rudiments of rough sets. Information Sciences 177(1), 3–27 (2007) 8. Rokach, L., Maimon, O.: Data Mining with Decision Trees: Theory and Applica- tions. World Scientific Publishing, River Edge, NJ (2008) 6