Introduction

Decision Trees for Knowledge Representation?

Mohammad Azad

mmazad@ju.edu.sa 0

Igor Chikalov

igor.chikalov@gmail.com 1

Mikhail Moshkov

mikhail.moshkov@kaust.edu.sa 1 0 Jouf University College of Computer and Information Sciences Sakaka 72388 , Saudi Arabia 1 King Abdullah University of Science and Technology (KAUST) Computer, Electrical and Mathematical Sciences & Engineering Division Thuwal 23955-6900 , Saudi Arabia

In this paper, we consider decision trees as a means of knowledge representation. To this end, we design three algorithms for decision tree construction that are based on extensions of dynamic programming. We study three parameters of the decision trees constructed by these algorithms: number of nodes, global misclassi cation rate, and local misclassi cation rate.

knowledge representation decision trees extensions of dynamic programming

Introduction

Decision trees are widely used as classi ers, as a means of knowledge representation, and as algorithms [ 3, 5, 8 ]. In this paper, we consider decision trees as a means of knowledge representation.

Let T be a decision table and be a decision tree for the table T [ 1 ]. We study three parameters of the tree :

To be understandable, the decision tree should have a reasonable number of nodes. To represent properly knowledge from the decision table T , the decision tree should have a reasonable accuracy. The consideration of only the global misclassi cation rate may be insu cient: the misclassi cations may be unevenly distributed and, for some terminal nodes, the fraction of misclassi cations can be high. To deal with this situation, we should consider also the local misclassi cation rate.

In this paper, we design three algorithms for decision tree construction which are applicable to medium-sized decision tables with only categorical attributes. These algorithms are based on extensions of dynamic programming { bi-criteria optimization of decision trees relative to the parameters N and G, and relative to the parameters N and L [ 1 ]. One of the algorithms (GL-algorithm) is completely new. We apply the considered algorithms to 14 decision tables from the UCI Machine Learning Repository [ 4 ], and study three parameters N , G, and L of the constructed trees.

The obtained results show that at least one of the considered algorithms (GL-algorithm) can be useful for the extraction of knowledge from mediumsized decision tables and for its representation by decision trees. This algorithm can be used in di erent areas of data analysis including rough set theory [ 6, 7 ].

The rest of the paper is organized as follows. In Sect. 2, we describe three algorithms for decision tree construction. In Sect. 3, we discuss results of experiments with decision tables from the UCI ML Repository [ 4 ]. Section 4 contains short conclusions. 2

Three Algorithms for Decision Tree Construction In the book [ 1 ], we described an algorithm A7 which, for a given decision table, constructs the set of Pareto optimal points (POPs) for the problem of bi-criteria optimization of decision trees relative to the parameters N and G (see, for example, Fig. 1 (a), (c), (e)). The same algorithm A7 can also construct the set of POPs for the problem of bi-criteria optimization of decision trees relative to the parameters N and L (see, for example, Fig. 1 (b), (d), (f)). For each POP, we can derive a decision tree with values of the considered parameters equal to the coordinates of this point.

We now describe three algorithms for decision tree construction based on the use of the algorithm A7. 2.1

G-Algorithm

For a given decision table T , we construct using the algorithm A7 the set of POPs for the parameters N and G. We normalize coordinates of POPs: for each POP, divide each coordinate by the maximum value of this coordinate among all POPs. After that, we choose a normalized POP with the minimum Euclidean distance from the origin. We restore coordinates of this point and derive a decision tree , for which the values of the parameters N and G are equal to the restored coordinates. The tree is the output of G-algorithm. 2.2

L-Algorithm

L-algorithm works in the same way as G-algorithm but instead of the parameters N and G it uses the parameters N and L. 2.3

GL-Algorithm

We apply G-algorithm to a given decision table T and construct a decision tree 1. After that, using the algorithm A7 we construct the set of POPs for the parameters N and L, and choose a POP for which the value of the coordinate N is closest to N ( 1). At the end, we derive a decision tree 2 for which the values of the parameters N and L are equal to the coordinates of the chosen POP. The tree 2 is the output of GL-algorithm. 3

Results of Experiments

We made experiments with 14 decision tables from the UCI ML Repository [ 4 ] described in Table 1.

We applied G-algorithm, L-algorithm, and GL-algorithm to each of these tables and found values of the parameters N , G, and L for the constructed decision trees. Results of experiments can be found in Table 2.

Decision trees constructed by G-algorithm have overall reasonable values of the parameters N and G but often have high values of the parameter L.

Decision trees constructed by L-algorithm have overall reasonable values of the parameters G and L but sometimes have high values of the parameter N . Decision table balance-scale breast-cancer cars hayes-roth-data house-votes-84 iris lenses lymphography nursery shuttle-landing soybean-small spect-test tic-tac-toe zoo-data 0.3 0.2 0.1

0 0.6 0.4 0.2

0 0.3 0.2 0.1 0 0 50 100 150 0 50 100 150

N (a) breast-cancer, N and G

N (b) breast-cancer, N and L 0 200 400 600 800 1,000

N (c) nursery, N and G 0 200 400 600 800 1,000

N (d) nursery, N and L 1 1 0 50 100 150 200 250

N (e) tic-tac-toe, N and G 0 50 100 150 200 250

N (f) tic-tac-toe, N and L

Decision trees constructed by GL-algorithm have overall reasonable values of the parameters 1N , G, and L. We can use GL-algorit1hm to construct enough understandable and accurate decision trees. We proposed to evaluate the accuracy of decision trees not only by the global misclassi cation rate G but also by the local misclassi cation rate L, and designed GL-algorithm which constructs decision trees with mostly reasonable number of nodes and mostly reasonable values of the parameters G and L. Later we are planning to extend the considered technique to the case of decision tables with many-valued decisions using bi-criteria optimization algorithms described in [ 2 ]. We are planning also to extend this technique to the case of decision tables with both categorical and numerical attributes.

Acknowledgments

Research reported in this publication was supported by King Abdullah University of Science and Technology (KAUST). The authors are greatly indebted to the anonymous reviewers for useful comments.

1. AbouEisha, H., Amin , T. , Chikalov , I. , Hussain , S. , Moshkov , M. : Extensions of Dynamic Programming for Combinatorial Optimization and

Data

Mining , Intelligent Systems Reference Library , vol. 146 . Springer ( 2019 )

2. Alsolami , F. , Azad , M. , Chikalov , I. , Moshkov , M. : Decision and Inhibitory Trees and Rules for Decision Tables with Many-valued Decisions , Intelligent Systems Reference Library , vol. 156 . Springer ( 2020 )

3. Breiman , L. , Friedman , J.H. , Olshen , R.A. , Stone , C.J.: Classi cation and Regression Trees . Wadsworth and Brooks , Monterey, CA ( 1984 )

4. Lichman , M.: UCI Machine Learning Repository . University of California, Irvine, School of Information and Computer Sciences ( 2013 ), http://archive.ics.uci.edu/ml

5. Moshkov , M. : Time complexity of decision trees . In: Peters, J.F. , Skowron , A . (eds.) Trans. Rough Sets III, Lecture Notes in Computer Science , vol. 3400 , pp. 244 { 459 . Springer ( 2005 )

6. Pawlak , Z. : Rough Sets { Theoretical Aspect of Reasoning About Data . Kluwer Academic Publishers, Dordrecht ( 1991 )

7. Pawlak , Z. , Skowron , A. : Rudiments of rough sets . Information Sciences 177 ( 1 ), 3 { 27 ( 2007 )

8. Rokach , L. , Maimon , O. : Data Mining with Decision Trees: Theory and Applications . World Scienti c Publishing, River Edge, NJ ( 2008 )