<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Decision Trees for Knowledge Representation?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mohammad Azad</string-name>
          <email>mmazad@ju.edu.sa</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Igor Chikalov</string-name>
          <email>igor.chikalov@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mikhail Moshkov</string-name>
          <email>mikhail.moshkov@kaust.edu.sa</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Jouf University College of Computer and Information Sciences Sakaka 72388</institution>
          ,
          <country country="SA">Saudi Arabia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>King Abdullah University of Science and Technology (KAUST) Computer, Electrical and Mathematical Sciences &amp; Engineering Division Thuwal</institution>
          <addr-line>23955-6900</addr-line>
          ,
          <country country="SA">Saudi Arabia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we consider decision trees as a means of knowledge representation. To this end, we design three algorithms for decision tree construction that are based on extensions of dynamic programming. We study three parameters of the decision trees constructed by these algorithms: number of nodes, global misclassi cation rate, and local misclassi cation rate.</p>
      </abstract>
      <kwd-group>
        <kwd>knowledge representation</kwd>
        <kwd>decision trees</kwd>
        <kwd>extensions of dynamic programming</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Decision trees are widely used as classi ers, as a means of knowledge
representation, and as algorithms [
        <xref ref-type="bibr" rid="ref3 ref5 ref8">3, 5, 8</xref>
        ]. In this paper, we consider decision trees as a
means of knowledge representation.
      </p>
      <p>
        Let T be a decision table and be a decision tree for the table T [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. We
study three parameters of the tree :
      </p>
      <p>To be understandable, the decision tree should have a reasonable number
of nodes. To represent properly knowledge from the decision table T , the
decision tree should have a reasonable accuracy. The consideration of only the
global misclassi cation rate may be insu cient: the misclassi cations may be
unevenly distributed and, for some terminal nodes, the fraction of misclassi
cations can be high. To deal with this situation, we should consider also the local
misclassi cation rate.</p>
      <p>
        In this paper, we design three algorithms for decision tree construction which
are applicable to medium-sized decision tables with only categorical attributes.
These algorithms are based on extensions of dynamic programming { bi-criteria
optimization of decision trees relative to the parameters N and G, and relative to
the parameters N and L [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. One of the algorithms (GL-algorithm) is completely
new. We apply the considered algorithms to 14 decision tables from the UCI
Machine Learning Repository [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], and study three parameters N , G, and L of
the constructed trees.
      </p>
      <p>
        The obtained results show that at least one of the considered algorithms
(GL-algorithm) can be useful for the extraction of knowledge from
mediumsized decision tables and for its representation by decision trees. This algorithm
can be used in di erent areas of data analysis including rough set theory [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ].
      </p>
      <p>
        The rest of the paper is organized as follows. In Sect. 2, we describe three
algorithms for decision tree construction. In Sect. 3, we discuss results of
experiments with decision tables from the UCI ML Repository [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Section 4 contains
short conclusions.
2
      </p>
      <p>
        Three Algorithms for Decision Tree Construction
In the book [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], we described an algorithm A7 which, for a given decision table,
constructs the set of Pareto optimal points (POPs) for the problem of bi-criteria
optimization of decision trees relative to the parameters N and G (see, for
example, Fig. 1 (a), (c), (e)). The same algorithm A7 can also construct the set of
POPs for the problem of bi-criteria optimization of decision trees relative to the
parameters N and L (see, for example, Fig. 1 (b), (d), (f)). For each POP, we
can derive a decision tree with values of the considered parameters equal to the
coordinates of this point.
      </p>
      <p>We now describe three algorithms for decision tree construction based on the
use of the algorithm A7.
2.1</p>
      <sec id="sec-1-1">
        <title>G-Algorithm</title>
        <p>For a given decision table T , we construct using the algorithm A7 the set of POPs
for the parameters N and G. We normalize coordinates of POPs: for each POP,
divide each coordinate by the maximum value of this coordinate among all POPs.
After that, we choose a normalized POP with the minimum Euclidean distance
from the origin. We restore coordinates of this point and derive a decision tree
, for which the values of the parameters N and G are equal to the restored
coordinates. The tree is the output of G-algorithm.
2.2</p>
      </sec>
      <sec id="sec-1-2">
        <title>L-Algorithm</title>
        <p>L-algorithm works in the same way as G-algorithm but instead of the parameters
N and G it uses the parameters N and L.
2.3</p>
      </sec>
      <sec id="sec-1-3">
        <title>GL-Algorithm</title>
        <p>We apply G-algorithm to a given decision table T and construct a decision tree
1. After that, using the algorithm A7 we construct the set of POPs for the
parameters N and L, and choose a POP for which the value of the coordinate N
is closest to N ( 1). At the end, we derive a decision tree 2 for which the values
of the parameters N and L are equal to the coordinates of the chosen POP. The
tree 2 is the output of GL-algorithm.
3</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Results of Experiments</title>
      <p>
        We made experiments with 14 decision tables from the UCI ML Repository [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
described in Table 1.
      </p>
      <p>We applied G-algorithm, L-algorithm, and GL-algorithm to each of these
tables and found values of the parameters N , G, and L for the constructed
decision trees. Results of experiments can be found in Table 2.</p>
      <p>Decision trees constructed by G-algorithm have overall reasonable values of
the parameters N and G but often have high values of the parameter L.</p>
      <p>Decision trees constructed by L-algorithm have overall reasonable values of
the parameters G and L but sometimes have high values of the parameter N .
Decision
table
balance-scale
breast-cancer
cars
hayes-roth-data
house-votes-84
iris
lenses
lymphography
nursery
shuttle-landing
soybean-small
spect-test
tic-tac-toe
zoo-data
0.3
0.2
0.1</p>
      <p>0
0.6
0.4
0.2</p>
      <p>0
0.3
0.2
0.1
0
0
50
100
150
0
50
100
150</p>
      <p>N
(a) breast-cancer, N and G</p>
      <p>N
(b) breast-cancer, N and L
0 200 400 600 800 1,000</p>
      <p>N
(c) nursery, N and G
0 200 400 600 800 1,000</p>
      <p>N
(d) nursery, N and L
1
1
0 50 100 150 200 250</p>
      <p>N
(e) tic-tac-toe, N and G
0 50 100 150 200 250</p>
      <p>N
(f) tic-tac-toe, N and L</p>
      <p>
        Decision trees constructed by GL-algorithm have overall reasonable values
of the parameters 1N , G, and L. We can use GL-algorit1hm to construct enough
understandable and accurate decision trees.
We proposed to evaluate the accuracy of decision trees not only by the global
misclassi cation rate G but also by the local misclassi cation rate L, and designed
GL-algorithm which constructs decision trees with mostly reasonable number of
nodes and mostly reasonable values of the parameters G and L. Later we are
planning to extend the considered technique to the case of decision tables with
many-valued decisions using bi-criteria optimization algorithms described in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
We are planning also to extend this technique to the case of decision tables with
both categorical and numerical attributes.
      </p>
      <sec id="sec-2-1">
        <title>Acknowledgments</title>
        <p>Research reported in this publication was supported by King Abdullah
University of Science and Technology (KAUST). The authors are greatly indebted to
the anonymous reviewers for useful comments.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. AbouEisha, H.,
          <string-name>
            <surname>Amin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chikalov</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hussain</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moshkov</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Extensions of Dynamic Programming for Combinatorial Optimization</article-title>
          and
          <string-name>
            <given-names>Data</given-names>
            <surname>Mining</surname>
          </string-name>
          ,
          <source>Intelligent Systems Reference Library</source>
          , vol.
          <volume>146</volume>
          . Springer (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Alsolami</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Azad</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chikalov</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moshkov</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Decision and Inhibitory Trees and Rules for Decision Tables with Many-valued Decisions</article-title>
          ,
          <source>Intelligent Systems Reference Library</source>
          , vol.
          <volume>156</volume>
          . Springer (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Breiman</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Friedman</surname>
            ,
            <given-names>J.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Olshen</surname>
            ,
            <given-names>R.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stone</surname>
            ,
            <given-names>C.J.:</given-names>
          </string-name>
          <article-title>Classi cation and Regression Trees</article-title>
          .
          <source>Wadsworth and Brooks</source>
          , Monterey, CA (
          <year>1984</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Lichman</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>UCI Machine Learning Repository</article-title>
          . University of California, Irvine, School of Information and Computer
          <string-name>
            <surname>Sciences</surname>
          </string-name>
          (
          <year>2013</year>
          ), http://archive.ics.uci.edu/ml
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Moshkov</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Time complexity of decision trees</article-title>
          . In: Peters,
          <string-name>
            <given-names>J.F.</given-names>
            ,
            <surname>Skowron</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . (eds.)
          <source>Trans. Rough Sets III, Lecture Notes in Computer Science</source>
          , vol.
          <volume>3400</volume>
          , pp.
          <volume>244</volume>
          {
          <fpage>459</fpage>
          . Springer (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Pawlak</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Rough Sets { Theoretical Aspect of Reasoning About Data</article-title>
          . Kluwer Academic Publishers, Dordrecht (
          <year>1991</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Pawlak</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Skowron</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Rudiments of rough sets</article-title>
          .
          <source>Information Sciences</source>
          <volume>177</volume>
          (
          <issue>1</issue>
          ),
          <volume>3</volume>
          {
          <fpage>27</fpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Rokach</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maimon</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Data Mining with Decision Trees: Theory and Applications</article-title>
          . World Scienti c Publishing, River Edge, NJ (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>