<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Average Depth and Number of Misclassi cations for Decision Trees</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Igor Chikalov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shahid Hussain</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mikhail Moshkov</string-name>
          <email>mikhail.moshkovg@kaust.edu.sa</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Mathematical and Computer Sciences &amp; Engineering Division King Abdullah University of Science and Technology Thuwal</institution>
          <addr-line>23955-6900</addr-line>
          ,
          <country country="SA">Saudi Arabia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents a new tool for the study of relationships between total path length or average depth and number of miscla ciations for decision trees. In addition to algorithm, the paper also presents the results of experiments with datasets from UCI ML Repository [1].</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Decision trees are widely used as predictors, as a way of knowledge
representation, and as algorithms for problem solving. There uses require optimizing
decision trees for certain cost functions such as the number of misclassi cations,
depth/average depth, and number of nodes. That is minimizing one of these cost
functions yields more accurate, faster, or more understandable decision trees
(respectively).</p>
      <p>
        We have created a software system for decision trees (as well as decision
rules) called dagger|a tool based on dynamic programming which allows us
to optimize decision trees (and decision rules) relative to various cost functions
such as depth (length), average depth (average length), total number of nodes,
and number of misclassi cations sequentially [2{5]. The aim of this paper is to
study the relationships between total path length (average depth) and number
of miscla cations for decision trees and present a new tool (an extension of our
software) for computing such relationships. We also consider the work of this
tool on decision tables from UCI ML Repository [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        The presented algorithm and its implementation in the software tool Dagger
together with similar algorithms devised by the authors (see for example [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]) can
be useful for investigations in Rough Sets [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ] where decision trees are used as
classi ers [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>This paper is divided into six sections including the Introduction. Section 2
presents basic notions about decision tables/trees and cost functions. In Sect. 3
an algorithm for constructing a directed acyclic graph for representation of
decision trees is presented. Section 4 presents the main algorithm and results for
this paper. Section 5 shows experimental results of the work of the algorithm
presented in this paper on di erent datasets and we conclude the paper in Sect. 6.</p>
    </sec>
    <sec id="sec-2">
      <title>Basic Notions</title>
      <p>In the following section we de ne the main notions related with the study of
decision trees and tables and di erent cost functions for the decision tree
construction.
2.1</p>
      <p>Decision Tables and Decision Trees
In this paper, we consider only decision tables with discrete attributes. These
tables do not contain missing values and equal rows. Consider a decision
table T depicted in Fig. 1. Here f1; : : : ; fm are the names of columns (conditional
f1 : : : fm d
b11 : : : b1m c1</p>
      <p>: : : : : :
bN1 : : : bNm cN
attributes); c1; : : : ; cN are nonnegative integers which can be interpreted as
decisions (values of the decision attribute d); bij are nonnegative integers which
are interpreted as values of conditional attributes (we assume that the rows
(b11; : : : ; b1m); : : : ; (bN1; : : : ; bNm) are pairwise di erent). We denote by E(T )
the set of attributes (columns of the table T ), each of which contains di erent
values. For fi 2 E(T ) let E(T; fi) be the set of values from the column fi.</p>
      <p>Let fi1 ; : : : ; fit 2 ff1; : : : ; fmg and a1; : : : ; at be nonnegative integers. We
denote by T (fi1 ; a1) : : : (fit ; at) the subtable of the table T , which consists of
such and only such rows of T that at the intersection with columns fi1 ; : : : ; fit
have numbers a1; : : : ; at respectively. Such nonempty tables (including the table
T ) will be called separable subtables of the table T . For a subtable of the
table T , we will denote by R( ) the number of unordered pairs of rows that are
labeled with di erent decisions. A minimum decision value which is attached to
the maximum number of rows in a nonempty subtable will be called the most
common decision for .</p>
      <p>A decision tree over the table T is a nite directed tree with root in which
each terminal node is labeled with a decision. Each nonterminal node is labeled
with a conditional attribute, and for each nonterminal node the outgoing edges
are labeled with pairwise di erent nonnegative integers. Let v be an arbitrary
node of . We now de ne a subtable T (v) of the table T . If v is the root then
T (v) = T . Let v be a node of that is not the root, nodes in the path from
the root to v be labeled with attributes fi1 ; : : : ; fit , and edges in this path be
labeled with values a1; : : : ; at respectively. Then T (v) = T (fi1 ; a1); : : : ; (fit ; at).</p>
      <p>Let be a decision tree over T . We will say that is a decision tree for T
if any node v of satis es the following conditions:
{ If R(T (v)) = 0 then v is a terminal node labeled with the most common
decision for T (v);
{ Otherwise, v is either a terminal node labeled with the most common decision
for T (v), or v is labeled with an attribute fi 2 E(T (v)) and if E(T (v); fi) =
fa1; : : : ; atg, then t edges leave node v, and these edges are labeled with
a1; : : : ; at respectively.</p>
      <p>Let be a decision tree for T . For any row r of T , there exists exactly one
terminal node v of such that r belongs to the table T (v). Let v be labeled with
the decision b. We will say about b as about the result of the work of decision
tree on r. We denote by N (T (v)) the number of rows in the subtable T (v)
and N (T (v); b) the number of rows in T (v) labeled with decision b.</p>
      <p>For an arbitrary row r of the decision table T , we denote by l(r) the length
of path from the root to a terminal node v of T such that r is in T (v). We say
that the total path length, represented as (Y; ), is the sum of path lengths l(r)
for all rows r in T . That is
(T; ) = X l(r);</p>
      <p>r
havg(T; ) =
(T; )
N (T )
:
where we take the sum on all rows r of the table T . Note that the average depth
of relative to T , represented as havg(T; ) is equal to the total path length
divided by the total number of rows in T i.e.,
We will drop T when it is obvious from the context. That is, we will write ( )
instead of (T; ) if T is known.</p>
      <p>The number of misclassi cations for decision tree for the table T , denoted
as ( ) = (T; ), is the number of rows r in T for which the result of the
work of decision tree on r is not equal to the decision attached to the row r.
We should note that the cost functions and are bounded above by values
depending upon the size of the table. That is, mN and N are upper bounds for
and for a decision table with m conditional attributes and N rows.
3</p>
      <p>Representation of Sets of Decision Trees
Consider an algorithm for construction of a graph (T ), which represents the
set of all decision trees for the table T . Nodes of this graph are some separable
subtables of the table T . During each step we process one node and mark it with
the symbol *. We start with the graph that consists of one node T and nish
when all nodes of the graph are processed.</p>
      <p>Let the algorithm has already performed p steps. We now describe the step
number (p + 1). If all nodes are processed then the work of the algorithm
is nished, and the resulting graph is (T ). Otherwise, choose a node
(table) that has not been processed yet. If R( ) = 0, label the considered
node with the common decision b for , mark it with symbol * and proceed
to the step number (p + 2). If R( ) &gt; 0, then for each fi 2 E( ) draw
a bundle of edges from the node (this bundle of edges will be called
fibundle). Let E( ; fi) = fa1; : : : ; atg. Then draw t edges from and label
these edges with pairs (fi; a1); : : : ; (fi; at) respectively. These edges enter into
nodes (fi; a1); : : : ; (fi; at). If some of the nodes (fi; a1); : : : ; (fi; at) are
not present in the graph then add these nodes to the graph. Mark the node
with the symbol * and proceed to the step number (p + 2).</p>
      <p>bm
a1
J
Γ1JJ
fmi</p>
      <p>Now for each node of the graph (T ), we describe the set of decision
trees corresponding to the node . We will move from terminal nodes, which are
labeled with numbers, to the node T . Let be a node, which is labeled with a
number b. Then the only trivial decision tree depicted in Fig. 2 corresponds to
the node .</p>
      <p>Let be a nonterminal node (table) then there is a number of bundles
of edges starting in . We consider an arbitrary bundle and describe the set
of decision trees corresponding to this bundle. Let the considered bundle be
an fi-bundle where fi 2 ( ) and E( ; fi) = fa1; : : : ; atg. Let 1; : : : ; t be
decision trees from sets corresponding to the nodes (fi; a1); : : : ; (fi; at). Then
the decision tree depicted in Fig. 3 belongs to the set of decision trees, which
correspond to this bundle. All such decision trees belong to the considered set,
and this set does not contain any other decision trees. Then the set of decision
trees corresponding to the node coincides with the union of sets of decision
trees corresponding to the bundles starting in and the set containing one
decision tree depicted in Fig. 2, where b is the most common decision for . We
denote by D( ) the set of decision trees corresponding to the node .</p>
      <p>The following proposition shows that the graph (T ) can represent all
decision trees for the table T .</p>
      <p>Proposition 1. Let T be a decision table and a node in the graph
Then the set D( ) coincides with the set of all decision trees for the table
(T ).</p>
      <p>.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Relationships</title>
      <p>In the following, we consider relationships between average depth (total path
length) and number of misclassi cations for decision trees and give an algorithm
to compute the relationships. We also provide an illustration of working of the
algorithm on an example decision table.</p>
      <p>Let T be a decision table with N rows and m columns labeled with f1; : : : ; fm.
Let be a node of (T ) and D( ) be the set of all decision trees for (as
discussed in Sec. 2). We will use the notion of total path length instead of average
depth for clarity and ease of implementation.</p>
      <p>We de ne two sets B = f0; 1; : : : ; mN g and B = f0; 1; : : : ; N g and
functions on these sets, G : B ! B and F : B ! B as following:
F
G
= minf ( ) :
= minf ( ) :
2 D( ); ( )
2 D( ); ( )
ng; n 2 B ;
ng; n 2 B :
n</p>
      <p>We now describe an algorithm which allows us to construct the function F
for every node from the graph (T ). We begin from terminal nodes and move
upward to the node T .</p>
      <p>Let be a terminal node. It means that all rows of are labeled with the
same decision b and the decision tree b as depicted in Fig. 2 belongs to D( ).
It is clear that ( b) = ( b) = 0 for the table . Therefore F (n) = 0 for any
n 2 B .</p>
      <p>Let us consider a nonterminal node and a bundle of edges, which start
from this node. Let these edges be labeled with the pairs (fi; a1); : : : ; (fi; at) and
enter into the nodes (fi; a1); : : : ; (fi; aj ), respectively, to which the functions
F (fi;a1); : : : ; F (fi;at) are already attached.</p>
      <p>We correspond to this bundle (fi-bundle) the function F fi for any n 2 B
N ( ),
where the minimum is taken over all n1; : : : ; nt such that nj 2 B for j = 1; : : : ; t
and n1 + + nt n N ( ). [It should be noted that computing F fi is a
nontrivial task. We describe the method in detail in the following subsection.]
Furthermore, we know that there is only one decision tree in D( ) for which
the total path length is at most N ( ) 1. This is the tree b as depicted in
Fig. 2, where b is the most common decision for , therefore, for any n 2 B ,
n &lt; N ( )</p>
      <p>F fi (n) = N ( )</p>
      <p>N ( ; b):</p>
      <p>We can use the following proposition to construct the function GT (using the
method of transformation of functions described in the Appendix).</p>
      <sec id="sec-3-1">
        <title>Proposition 2. For any n 2 B ,</title>
        <p>GT (n) = minfp 2 B : FT (p)
ng:</p>
        <p>Note that to nd the value GT (n) for some n 2 B it is enough to make
O(log jB j) = O(log N ) operations of comparisons.
Let be a nonterminal node in (T ), fi 2 E( ) and E( ; fi) = fa1; : : : ; atg.
Furthermore, we assume that the functions F (fi;aj), for j = 1; : : : ; t have
already been computed. Let the values of F (fi;aj) be given by the tuple of pairs
(0; j0); (1; j1); : : : ; (mN; jmN )
here jk = F (fi;aj)(k) for k = 0; : : : ; mN . We need to compute F fi for all
n 2 B , n N ( ),
for nj 2 B such that n1 + + nt n N ( ).</p>
        <p>We construct a layered directed acyclic graph (DAG) ( ; fi) to compute
F fi as following. The DAG ( ; fi) contains nodes arranged in t + 1 layers
(l0; l1; : : : ; lt). Each node has a pair of labels and each layer lj (1 j t)
contains jmN + 1 nodes. The rst entry of labels for nodes in a layer lj is an
integer from f0; 1; : : : ; jmN g. The layer l0 contains only one node labeled with
(0; 0).</p>
        <p>Each node in layer lj (0 j &lt; t) has mN + 1 outgoing edges to nodes in
layer lj+1. These edges are labeled with corresponding pairs in F (fi;aj+1). A
node with label x as a rst entry in its label-pair in a layer lj connects to nodes
with labels x + 0 to x + mN (as a rst entry in their label-pairs) in layer lj+1,
with edges labeled as (0; j0+1); (1; j1+1); : : : ; (mN; jm+N1) , respectively.</p>
        <p>The function F fi (n) for n 2 B n f0g can be easily computed using the DAG
( ; fi) for 2 (T ) and for the considered bundle of edges for the attribute
fi 2 E( ) as following:</p>
        <p>Each node in layer l1 gets its second value copied from the corresponding
second value from the incoming edge label to the node (since there is only one
incoming edge for each node in layer l1). Let (k; ) be a node in layer lj , (2
j t). Let E = f( 1; 1); ( 2; 2); : : : ; ( r; r)g be the set of incoming nodes
to (k; ) such that ( 1; 1); ( 2; 2); : : : ; ( r; r) are the labels of these edges
between the nodes in E and (k; ), respectively. It is clear that k = i + i,
(1 i r). We set
= min f i + ig:</p>
        <p>1 i r
We do this for every node layer-by-layer till all nodes in ( ; fi) have received
their second label.</p>
        <p>Once we nish computing the second value of label-pairs of layer lt, we can
use these labels to compute F fi (n). Let (k1; 1); : : : ; (ks; s) be all label-pairs
to the nodes in lt. One can show that for all n 2 B , n N ( ),
F fi (n) = minf q : q 2 f1; : : : ; sg; kq
n</p>
        <p>N ( )g:</p>
        <p>An example of working of the algorithm can be found in the Fig. 4. It is
important to note that in this example we only show the values for total path
lengths possible for the set of trees that can be constructed for the given subtable
(table) and their corresponding number of misclassi cations.</p>
        <p>
          Λ 0
μ 0
Λ 0
μ 0
We performed several experiments on datasets (decision tables) acquired from
UCI ML Repository [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. The resulting plots are depicted in Figs. 5, 6, 7, and 8.
        </p>
        <p>It is clear to see in above gures that when the total path length is zero we
get maximum number of misclassi cations (as it is the base case) which makes
the gures less clear in other regions. We remove the (0; 67) point from Fig. 5
and show the resulting plot for the average depth instead of total path length
in Fig. 9. Similarly, Fig. 10 shows in inverse of plot shown in Fig. 6 without the
base case (0; 3916).
6</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>This paper is devoted to the consideration of a new tool for decision tree study.
We present and explain in detail the algorithm to compute the relationships
between the total path length (average depth) and number of misclassi cations
for decision trees. That is, for a given decision table we can nd minimum number
of misclassi cations for a given value for total path length when we consider the
60
s
n
o
i
t
a
c
isfi 40
s
a
l
c
s
i
fm 20
o
o
N
0
40
s
n
ito 30
a
c
fi
i
s
lsa 20
c
s
i
m
o 10
f
o
N
0
15
s
n
o
i
t
ca 10
fi
i
s
s
a
l
c
s
i
fm 5
o
o
N
0
1
1.5 2</p>
      <p>Average depth
2.5
20 40 60 80 100 120
No. of Misclassifications
8,000
s
n
o
i
ta6,000
c
fi
i
s
s
lca4,000
s
i
m
fo2,000
o
N</p>
      <p>0
1.4
h
t
p
e
d
e
g
rea 1.2
v
A
1
0
0
1
exact decision trees. Future studies will be connected with the extension of this
tool to other cost functions specially the relationships between the depth and
average depth of decision trees.
Appendix: Transformation of Functions
Let f and g be two functions from a set A onto Cf and Cg respectively, where Cf
and Cg are nite sets of nonnegative integers. Let Bf = fmf ; mf + 1; : : : ; Mf g
and Bg = fng; ng + 1; : : : ; Ngg where mf = minfm : m 2 Cf g and ng = minfn :
n 2 Cgg. Furthermore, Mf and Ng are natural numbers such that m Mf and
n Ng for any m 2 Cf and n 2 Cg, respectively.</p>
      <p>We de ne two functions F : Bg ! Bf and G : Bf ! Bg as following:
F (n) = minff (a) : a 2 A; g(a)
G(m) = minfg(a) : a 2 A; f (a)
ng; 8n 2 Bg;
mg; 8m 2 Bf :
(1)
It is clear that both F and G are nonincreasing functions.</p>
      <p>The following proposition states that the functions F and G can be used
interchangeably and we can evaluate F using G and vice versa, i.e., it is enough
to know only one function to evaluate the other.</p>
      <sec id="sec-4-1">
        <title>Proposition 3. For any n 2 Bg,</title>
        <p>and for any m 2 Bf ,
Proof. Let for some n 2 Bg</p>
        <sec id="sec-4-1-1">
          <title>Furthermore, we assume that</title>
        </sec>
        <sec id="sec-4-1-2">
          <title>From (3) it follows that</title>
          <p>F (n) = minfm 2 Bf : G(m)
(3)
(4)
(i) there exists b 2 A such that g(b)
(ii) for any a 2 A if g(a) n then f (a)
n and f (b) = m0;</p>
          <p>m0.</p>
          <p>From (i) it follows that G(m0) n. This implies t m0. Let us assume that
t &lt; m0. In this case, there exits m1 &lt; m0 for which G(m1) n. Therefore, there
exists a 2 A such that f (a) m1 and g(a) n, but from (ii) it follows that
f (a) m0, which is impossible. So t = m0.</p>
          <p>Similarly we can prove the second part of the statement.</p>
          <p>Proposition 3 allows us to transform the function G given by a tuple
(G(mf ); G(mf + 1); : : : ; G(Mf )) into the function F and vice versa. We know
that G(mf ) G(mf + 1) G(Mf ), to nd the minimum m 2 Bf such
that G(m) m we can use binary search which requires O(logjBf j) comparisons
of numbers. So to nd the value F (n) for n 2 Bg it is enough to make O(logjBf j)
operations of comparison.</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Frank</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Asuncion</surname>
            ,
            <given-names>A.:</given-names>
          </string-name>
          <article-title>UCI Machine Learning Repository (</article-title>
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Alkhalid</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chikalov</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moshkov</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>On algorithm for building of optimal - decision trees</article-title>
          . In Szczuka, M.S.,
          <string-name>
            <surname>Kryszkiewicz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramanna</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jensen</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hu</surname>
          </string-name>
          , Q., eds.
          <source>: RSCTC</source>
          <year>2010</year>
          ,
          <article-title>LNCS (LNAI)</article-title>
          . Volume
          <volume>6086</volume>
          ., Heidelberg, Springer (
          <year>2010</year>
          )
          <volume>438</volume>
          {
          <fpage>445</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Alkhalid</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chikalov</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moshkov</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>A tool for study of optimal decision trees</article-title>
          . In Yu, J.,
          <string-name>
            <surname>Greco</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lingras</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Skowron</surname>
          </string-name>
          , A., eds.
          <source>: RSKT</source>
          . Volume LNCS
          <volume>6401</volume>
          ., Springer (
          <year>2010</year>
          )
          <volume>353</volume>
          {
          <fpage>360</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Alkhalid</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chikalov</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hussain</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moshkov</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>In: Extensions of dynamic programming as a new tool for decision tree optimization</article-title>
          .
          <source>Volume 13 of SIST</source>
          . Springer, Heidelberg (
          <year>2012</year>
          )
          <volume>16</volume>
          {
          <fpage>36</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Alkhalid</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chikalov</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hussain</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moshkov</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zielosko</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Dagger: a tool for analysis and optimization of decision trees and rules</article-title>
          . In: Computational Informatics,
          <string-name>
            <given-names>Social</given-names>
            <surname>Factors</surname>
          </string-name>
          and New Information Technologies:
          <article-title>Hypermedia Perspectives and Avant-Garde Experiencies in the Era of Communicability Expansion</article-title>
          . Blue
          <string-name>
            <surname>Herons</surname>
          </string-name>
          (
          <year>2011</year>
          )
          <volume>29</volume>
          {
          <fpage>39</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Chikalov</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hussain</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moshkov</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Relationships between depth and number of missclassi cations for decision trees</article-title>
          . In Kuznetsov,
          <string-name>
            <given-names>S.O.</given-names>
            ,
            <surname>Slezak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Hepting</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.H.</given-names>
            ,
            <surname>Mirkin</surname>
          </string-name>
          , B., eds.: Thirteenth International Conference on Rough Sets, Fuzzy Sets,
          <source>Data Mining and Granualr Computing (RSFDGrC</source>
          <year>2011</year>
          ). Volume LNCS
          <volume>6743</volume>
          ., Springer (
          <year>2011</year>
          )
          <volume>286</volume>
          {
          <fpage>292</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Pawlak</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <source>Theoretical Aspects of Reasoning about Data</source>
          . Kluwer Academic Publishers, Dordrecht (
          <year>1991</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Skowron</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rauszer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>The discernibility matrices and functions in information systems</article-title>
          . In Slowinski, R., ed.:
          <article-title>Intelligent Decision Support. Handbook of Applications and Advances of the Rough Set Theory</article-title>
          , Dordrecht, Kluwer Academic Publishers (
          <year>1992</year>
          )
          <volume>331</volume>
          {
          <fpage>362</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>H.S.:</given-names>
          </string-name>
          <article-title>From optimal hyperplanes to optimal decision trees</article-title>
          .
          <source>Fundamenta Informaticae</source>
          <volume>34</volume>
          (
          <issue>1-2</issue>
          ) (
          <year>1998</year>
          )
          <volume>145</volume>
          {
          <fpage>174</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>