=Paper=
{{Paper
|id=Vol-2571/CSP2019_paper_1
|storemode=property
|title=Decision Trees for Knowledge Representation (short paper)
|pdfUrl=https://ceur-ws.org/Vol-2571/CSP2019_paper_1.pdf
|volume=Vol-2571
|authors=Mohammad Azad,Igor Chikalov,Mikhail Moshkov
|dblpUrl=https://dblp.org/rec/conf/csp/AzadCM19
}}
==Decision Trees for Knowledge Representation (short paper)==
<pdf width="1500px">https://ceur-ws.org/Vol-2571/CSP2019_paper_1.pdf</pdf>
<pre>
    Decision Trees for Knowledge Representation?

            Mohammad Azad1 , Igor Chikalov2 , and Mikhail Moshkov2
                                   1
                                    Jouf University
                  College of Computer and Information Sciences
                             Sakaka 72388, Saudi Arabia
                                 mmazad@ju.edu.sa
         2
           King Abdullah University of Science and Technology (KAUST)
       Computer, Electrical and Mathematical Sciences & Engineering Division
                         Thuwal 23955-6900, Saudi Arabia
           igor.chikalov@gmail.com, mikhail.moshkov@kaust.edu.sa


        Abstract. In this paper, we consider decision trees as a means of knowl-
        edge representation. To this end, we design three algorithms for decision
        tree construction that are based on extensions of dynamic programming.
        We study three parameters of the decision trees constructed by these
        algorithms: number of nodes, global misclassification rate, and local mis-
        classification rate.

        Keywords: knowledge representation, decision trees, extensions of dy-
        namic programming


1     Introduction
Decision trees are widely used as classifiers, as a means of knowledge represen-
tation, and as algorithms [3, 5, 8]. In this paper, we consider decision trees as a
means of knowledge representation.
    Let T be a decision table and Γ be a decision tree for the table T [1]. We
study three parameters of the tree Γ :
 – N (Γ ) – the number of nodes in Γ .
 – G(T, Γ ) – the global misclassification rate which is equal to the number of
   misclassifications of Γ on T divided by the number of rows in T .
 – L(T, Γ ) – the local misclassification rate. For each terminal node v of Γ ,
   we find the number of misclassifications of Γ on rows of T for which the
   computation finishes in v divided by the number of rows of T for which the
   computation finishes in v, and consider maximum of this parameter among
   all terminal nodes of Γ . It is easy to show that G(T, Γ ) ≤ L(T, Γ ).
    To be understandable, the decision tree Γ should have a reasonable number
of nodes. To represent properly knowledge from the decision table T , the de-
cision tree Γ should have a reasonable accuracy. The consideration of only the
?
    Copyright c 2019 for this paper by its authors. Use permitted under Creative
    Commons License Attribution 4.0 International (CC BY 4.0).


                                            1
global misclassification rate may be insufficient: the misclassifications may be
unevenly distributed and, for some terminal nodes, the fraction of misclassifica-
tions can be high. To deal with this situation, we should consider also the local
misclassification rate.
    In this paper, we design three algorithms for decision tree construction which
are applicable to medium-sized decision tables with only categorical attributes.
These algorithms are based on extensions of dynamic programming – bi-criteria
optimization of decision trees relative to the parameters N and G, and relative to
the parameters N and L [1]. One of the algorithms (GL-algorithm) is completely
new. We apply the considered algorithms to 14 decision tables from the UCI
Machine Learning Repository [4], and study three parameters N , G, and L of
the constructed trees.
    The obtained results show that at least one of the considered algorithms
(GL-algorithm) can be useful for the extraction of knowledge from medium-
sized decision tables and for its representation by decision trees. This algorithm
can be used in different areas of data analysis including rough set theory [6, 7].
    The rest of the paper is organized as follows. In Sect. 2, we describe three
algorithms for decision tree construction. In Sect. 3, we discuss results of exper-
iments with decision tables from the UCI ML Repository [4]. Section 4 contains
short conclusions.


2     Three Algorithms for Decision Tree Construction

In the book [1], we described an algorithm A7 which, for a given decision table,
constructs the set of Pareto optimal points (POPs) for the problem of bi-criteria
optimization of decision trees relative to the parameters N and G (see, for ex-
ample, Fig. 1 (a), (c), (e)). The same algorithm A7 can also construct the set of
POPs for the problem of bi-criteria optimization of decision trees relative to the
parameters N and L (see, for example, Fig. 1 (b), (d), (f)). For each POP, we
can derive a decision tree with values of the considered parameters equal to the
coordinates of this point.
    We now describe three algorithms for decision tree construction based on the
use of the algorithm A7 .


2.1   G-Algorithm

For a given decision table T , we construct using the algorithm A7 the set of POPs
for the parameters N and G. We normalize coordinates of POPs: for each POP,
divide each coordinate by the maximum value of this coordinate among all POPs.
After that, we choose a normalized POP with the minimum Euclidean distance
from the origin. We restore coordinates of this point and derive a decision tree
Γ , for which the values of the parameters N and G are equal to the restored
coordinates. The tree Γ is the output of G-algorithm.

                                        2
2.2   L-Algorithm
L-algorithm works in the same way as G-algorithm but instead of the parameters
N and G it uses the parameters N and L.

2.3   GL-Algorithm
We apply G-algorithm to a given decision table T and construct a decision tree
Γ1 . After that, using the algorithm A7 we construct the set of POPs for the
parameters N and L, and choose a POP for which the value of the coordinate N
is closest to N (Γ1 ). At the end, we derive a decision tree Γ2 for which the values
of the parameters N and L are equal to the coordinates of the chosen POP. The
tree Γ2 is the output of GL-algorithm.


3     Results of Experiments
We made experiments with 14 decision tables from the UCI ML Repository [4]
described in Table 1.

                   Table 1. Decision tables used in experiments

                   Decision             Number of    Number of
                   table                    rows      attributes
                   balance-scale               625            5
                   breast-cancer               266           10
                   cars                       1728            7
                   hayes-roth-data              69            5
                   house-votes-84              279           17
                   iris                        150            5
                   lenses                       10            5
                   lymphography                148           19
                   nursery                   12960            9
                   shuttle-landing              15            7
                   soybean-small                47           36
                   spect-test                  169           23
                   tic-tac-toe                 958           10
                   zoo-data                     59           17


   We applied G-algorithm, L-algorithm, and GL-algorithm to each of these
tables and found values of the parameters N , G, and L for the constructed
decision trees. Results of experiments can be found in Table 2.
   Decision trees constructed by G-algorithm have overall reasonable values of
the parameters N and G but often have high values of the parameter L.
   Decision trees constructed by L-algorithm have overall reasonable values of
the parameters G and L but sometimes have high values of the parameter N .


                                         3
     0.3                                                               0.3


     0.2                                                               0.2

 G                                                             L

     0.1                                                               0.1


      0                                                                 0

            0         50             100           150                        0         50             100           150
                                 N                                                                 N

     (a) breast-cancer, N and G                                        (b) breast-cancer, N and L


     0.6                                                               0.6


     0.4                                                               0.4
 G                                                             L

     0.2                                                               0.2


      0                                                                 0

            0   200    400       600       800     1,000                      0   200    400       600       800     1,000
                                 N                                                                 N
           (c) nursery, N and G                                              (d) nursery, N and L


     0.3                                                               0.3


     0.2                                                               0.2
 G                         1                                   L                             1

     0.1                                                               0.1


      0                                                                 0

            0   50         100       150     200         250                  0   50         100       150     200         250
                                 N                                                                 N

       (e) tic-tac-toe, N and G                                          (f) tic-tac-toe, N and L

Fig. 1. Sets of Pareto optimal points for decision tables breast-cancer, nursery,
and tic-tac-toe for pairs of parameters N , G and N , L


    Decision trees constructed by GL-algorithm have overall reasonable values
of the parameters 1N , G, and L. We can use GL-algorithm
                                                     1   to construct enough
understandable and accurate decision trees.


                                                                   4
                          Table 2. Results of experiments

    Decision              G-algorithm             L-algorithm         GL-algorithm
    table                N     G      L          N     G      L       N     G      L
    balance-scale       106   0.18   0.40       186   0.14   0.24     91    0.22   0.32
    breast-cancer        59   0.11   0.33        58   0.13   0.16     58    0.13   0.16
    cars                 98   0.08   0.50       338   0.01   0.08    135    0.14   0.29
    hayes-roth-data      23   0.16   0.50        26   0.13   0.33     26    0.13   0.33
    house-votes-84        3   0.06   0.12        11   0.03   0.06      3    0.06   0.13
    iris                  5   0.04   0.09         5   0.04   0.09      5    0.04   0.09
    lenses                6   0.10   0.50         8   0.00   0.00      8    0.00   0.00
    lymphography         13   0.13   0.33        11   0.16   0.20     11    0.16   0.20
    nursery              74   0.08   0.34       115   0.09   0.22     70    0.10   0.23
    shuttle-landing       7   0.20   0.33         5   0.27   0.31      5    0.27   0.31
    soybean-small         3   0.42   0.55         3   0.43   0.50      3    0.43   0.50
    spect-test           17   0.02   0.10        19   0.02   0.02     17    0.02   0.03
    tic-tac-toe          72   0.11   0.46        82   0.12   0.19     72    0.15   0.20
    zoo-data              8   0.19   0.43         9   0.20   0.28      9    0.20   0.28
    Average           35.29   0.13   0.36   62.57     0.13   0.19   36.64   0.15   0.22


4     Conclusions

We proposed to evaluate the accuracy of decision trees not only by the global mis-
classification rate G but also by the local misclassification rate L, and designed
GL-algorithm which constructs decision trees with mostly reasonable number of
nodes and mostly reasonable values of the parameters G and L. Later we are
planning to extend the considered technique to the case of decision tables with
many-valued decisions using bi-criteria optimization algorithms described in [2].
We are planning also to extend this technique to the case of decision tables with
both categorical and numerical attributes.


Acknowledgments

Research reported in this publication was supported by King Abdullah Univer-
sity of Science and Technology (KAUST). The authors are greatly indebted to
the anonymous reviewers for useful comments.


References
1. AbouEisha, H., Amin, T., Chikalov, I., Hussain, S., Moshkov, M.: Extensions of Dy-
   namic Programming for Combinatorial Optimization and Data Mining, Intelligent
   Systems Reference Library, vol. 146. Springer (2019)
2. Alsolami, F., Azad, M., Chikalov, I., Moshkov, M.: Decision and Inhibitory Trees and
   Rules for Decision Tables with Many-valued Decisions, Intelligent Systems Reference
   Library, vol. 156. Springer (2020)


                                            5
3. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regres-
   sion Trees. Wadsworth and Brooks, Monterey, CA (1984)
4. Lichman, M.: UCI Machine Learning Repository. University of California, Irvine,
   School of Information and Computer Sciences (2013), http://archive.ics.uci.edu/ml
5. Moshkov, M.: Time complexity of decision trees. In: Peters, J.F., Skowron, A. (eds.)
   Trans. Rough Sets III, Lecture Notes in Computer Science, vol. 3400, pp. 244–459.
   Springer (2005)
6. Pawlak, Z.: Rough Sets – Theoretical Aspect of Reasoning About Data. Kluwer
   Academic Publishers, Dordrecht (1991)
7. Pawlak, Z., Skowron, A.: Rudiments of rough sets. Information Sciences 177(1),
   3–27 (2007)
8. Rokach, L., Maimon, O.: Data Mining with Decision Trees: Theory and Applica-
   tions. World Scientific Publishing, River Edge, NJ (2008)


                                          6

</pre>