Sorting by Decision Trees with Hypotheses (extended
abstract)
Mohammad Azad1 , Igor Chikalov2 , Shahid Hussain3 and Mikhail Moshkov4
1
  Jouf University, Sakaka 72441, Saudi Arabia
2
  Intel Corporation, 5000 W Chandler Blvd, Chandler, AZ 85226, USA
3
  Institute of Business Administration, University Road, Karachi 75270, Pakistan
4
  King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia


                                         Abstract
                                         In this paper, we consider decision trees that use both queries based on one attribute each and queries
                                         based on hypotheses about values of all attributes. Such decision trees are similar to ones studied in
                                         exact learning, where not only membership but also equivalence queries are allowed. For 𝑛 = 3, . . . , 6,
                                         we compare decision trees based on various combinations of attributes and hypotheses for sorting 𝑛
                                         pairwise different elements from linearly ordered set.

                                         Keywords
                                         decision tree, hypothesis, dynamic programming, sorting


1. Introduction
Decision trees are widely used in many areas of computer science, for example, test theory
(initiated by Chegis and Yablonskii [1]), rough set theory (initiated by Pawlak [2, 3, 4]), and
exact learning (initiated by Angluin [5, 6]). These theories are closely related: attributes from
rough set theory and test theory correspond to membership queries from exact learning. Exact
learning also studies equivalence queries. The notion of “minimally adequate teacher” using
both membership and equivalence queries was discussed by Angluin in [7]. Relations between
exact learning and PAC learning proposed by Valiant [8] were considered in [5].
   In [9, 10, 11], we added the notion of a hypothesis (an analog of equivalence queries) to the
model considered in both rough set theory and test theory and proposed dynamic programming
algorithms for the optimization of the decision trees with hypotheses. Note that the dynamic
programming algorithms for the optimization of the conventional decision trees that do not use
hypotheses were proposed earlier [12].
   In the present paper, we consider an application of the dynamic programming algorithms
from [9, 10, 11] to the study of the problem of sorting. We compare the complexity of five types
of optimal (relative to the depth and relative to the number of realizable nodes) decision trees

29th International Workshop on Concurrency, Specification and Programming (CS&P’21)
" mmazad@ju.edu.sa (M. Azad); igor.chikalov@gmail.com (I. Chikalov); shahidhussain@iba.edu.pk (S. Hussain);
mikhail.moshkov@kaust.edu.sa (M. Moshkov)
 0000-0001-9851-1420 (M. Azad); 0000-0002-1010-6605 (I. Chikalov); 0000-0002-1698-2809 (S. Hussain);
0000-0003-0085-9483 (M. Moshkov)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)
based on various combinations of attributes and hypotheses for sorting 𝑛 pairwise different
elements from linearly ordered set, 𝑛 = 3, . . . , 6. Results obtained for the conventional decision
trees are known – see book [12]. Results obtained for the decision trees with hypotheses are
completely new.
   Note that in the present paper we follow [11] when discuss the notions related to the decision
trees with hypotheses. Complete definitions of these notions can be found in the same paper.


2. Five Types of Decision Trees and Their Optimization
Let 𝑇 be a decision table with 𝑛 conditional attributes 𝑓1 , . . . , 𝑓𝑛 that have values from the
set 𝜔 = {0, 1, 2, . . .}. Rows of this table are pairwise different and each row is labeled with a
decision. For a given row of 𝑇 , we should recognize the decision attached to it. To this end,
we will use decision trees based on two types of queries. We can ask about the value of a
conditional attribute 𝑓𝑖 ∈ {𝑓1 , . . . , 𝑓𝑛 } on the given row. As a result, we obtain an answer of
the kind 𝑓𝑖 = 𝛿, where 𝛿 is the number in the intersection of the given row and the column 𝑓𝑖 .
We can also ask if a hypothesis {𝑓1 = 𝛿1 , . . . , 𝑓𝑛 = 𝛿𝑛 } is true, where the numbers 𝛿1 , . . . , 𝛿𝑛
belong to the columns 𝑓1 , . . . , 𝑓𝑛 , respectively. Either this hypothesis is confirmed or we obtain
a counterexample of the kind 𝑓𝑖 = 𝜎, where 𝑓𝑖 ∈ {𝑓1 , . . . , 𝑓𝑛 } and 𝜎 is a number from the
column 𝑓𝑖 that is different from 𝛿𝑖 . We will say that this hypothesis is proper if (𝛿1 , . . . , 𝛿𝑛 ) is a
row of the table 𝑇 .
   We study the following five types of decision trees:
   1. Decision trees based on attributes only.
   2. Decision trees based on hypotheses only.
   3. Decision trees based on both attributes and hypotheses.
   4. Decision trees based on proper hypotheses only.
   5. Decision trees based on both attributes and proper hypotheses.
   As time complexity of a decision tree we consider its depth, which is equal to the maximum
number of queries in a path from the root to a terminal node of the tree. We consider the
number of realizable relative to 𝑇 nodes in a decision tree as its space complexity. A node is
called realizable relative to 𝑇 if the computation in the tree will pass through this node for some
row and some choice of counterexamples. We use the following notation:

    • ℎ(𝑘) (𝑇 ) denotes the minimum depth of a decision tree of the type 𝑘 for 𝑇 , 𝑘 = 1, . . . , 5.
    • 𝐿(𝑘) (𝑇 ) denotes the minimum number of nodes realizable relative to 𝑇 in a decision tree
      of the type 𝑘 for 𝑇 , 𝑘 = 1, . . . , 5.

   In [9] and [10], dynamic programming algorithms for the optimization of decision trees of
all five types relative to the depth and the number of realizable nodes were proposed (see also
journal extension [11] of these papers that considers additionally two cost functions: the number
of realizable terminal nodes and the number of nonterminal nodes). Note that algorithms for
the minimization of the depth and number of nodes for decision trees of the type 1 were
considered in [12] for decision tables with one-valued decisions and in [13] for decision tables
with many-valued decisions.
Table 1
Experimental results for the depth

                     𝑛    ℎ(1) (𝑇𝑛 )   ℎ(2) (𝑇𝑛 )   ℎ(3) (𝑇𝑛 )   ℎ(4) (𝑇𝑛 )   ℎ(5) (𝑇𝑛 )
                     3       3             2            2            2        2
                     4       5             4            4            4        4
                     5       7             6            6            6        6
                     6       10            9            9            9        9


Table 2
Experimental results for the number of realizable nodes

                    𝑛    𝐿(1) (𝑇𝑛 )    𝐿(2) (𝑇𝑛 )   𝐿(3) (𝑇𝑛 )   𝐿(4) (𝑇𝑛 )   𝐿(5) (𝑇𝑛 )
                     3       11            13           9           14        9
                     4       47           253           39          254       39
                     5      239         15,071         199        15,142      199
                     6     1,439       2,885,086      1,199      2,886,752    1,199


   Dynamic programming optimization algorithms are applicable to medium-sized decision
tables. These algorithms first construct a directed acyclic graph (DAG) whose nodes are some
subtables of the original decision table given by conditions of the type “attribute = value”. Then
they pass through all the nodes of the DAG, starting with the simplest subtables, and for each
subtable they find the minimum value of the considered cost function.
   In the present paper, we use algorithms proposed in [9, 10, 11] to study decision trees of all
five types optimal relative to the depth and relative to the number of realizable nodes for the
sorting problem. Results for decision trees of the type 1 were obtained earlier [12]. Results for
decision trees of the types 2–5 are new.


3. Problem of Sorting
In this paper, we study the problem of sorting 𝑛 elements. Let 𝑥1 , . . . , 𝑥𝑛 be pairwise different
elements from a linearly ordered set. We should find a permutation (𝑝1 , . . . , 𝑝𝑛 ) from the set
𝑃𝑛 of all permutations of the set {1, . . . , 𝑛} for which 𝑥𝑝1 < · · · < 𝑥𝑝𝑛 . To this end, we use
attributes 𝑥𝑖 : 𝑥𝑗 such that 𝑖, 𝑗 ∈ {1, . . . , 𝑛}, 𝑖 < 𝑗, 𝑥𝑖 : 𝑥𝑗 = 1 if 𝑥𝑖 < 𝑥𝑗 , and 𝑥𝑖 : 𝑥𝑗 = 0 if
𝑥𝑖 > 𝑥𝑗 .
   The problem of sorting 𝑛 elements can be represented as a decision table 𝑇𝑛 with 𝑛(𝑛 −
1)/2 conditional attributes 𝑥𝑖 : 𝑥𝑗 , 𝑖, 𝑗 ∈ {1, . . . , 𝑛}, 𝑖 < 𝑗, and 𝑛! rows corresponding to
permutations from 𝑃𝑛 . For each permutation (𝑝1 , . . . , 𝑝𝑛 ), the corresponding row of 𝑇𝑛 is
labeled with this permutation as the decision. This row is filled with values of attributes 𝑥𝑖 : 𝑥𝑗
such that 𝑥𝑖 : 𝑥𝑗 = 1 if and only if 𝑖 stays before 𝑗 in the tuple (𝑝1 , . . . , 𝑝𝑛 ).
   For 𝑛 = 3, . . . , 6 and 𝑘 = 1, . . . , 5, we find values of ℎ(𝑘) (𝑇𝑛 ) and 𝐿(𝑘) (𝑇𝑛 ) using dynamic
programming algorithms described in [9, 10, 11] – see results in Tables 1 and 2.
   From the obtained experimental results it follows that the decision trees of the types 2–5 can
have less depth than the decision trees of the type 1. Decision trees of the types 3 and 5 can
have less number of realizable nodes than the decision trees of the type 1. Decision trees of the
types 2 and 4 have too many nodes.


4. Conclusions
In this paper, we found the minimum depth and the minimum number of realizable nodes of
five types of decision trees for sorting 𝑛 elements, 𝑛 = 3, . . . , 6.
   In the future, we are planning to study joint behavior of the depth and the number of nodes in
such decision trees. It would be also interesting to compare the complexity of optimal decision
trees of the considered five types constructed by dynamic programming algorithms and the
complexity of decision trees constructed by entropy-based greedy algorithm proposed in [14].


Acknowledgments
Research reported in this publication was supported by King Abdullah University of Science
and Technology (KAUST). The authors are indebted to the anonymous reviewers for interesting
comments.


References
 [1] I. A. Chegis, S. V. Yablonskii, Logical methods of control of work of electric schemes,
     Trudy Mat. Inst. Steklov (in Russian) 51 (1958) 270–360.
 [2] Z. Pawlak, Rough sets, Int. J. Parallel Program. 11 (1982) 341–356.
 [3] Z. Pawlak, Rough Sets - Theoretical Aspects of Reasoning about Data, volume 9 of Theory
     and Decision Library: Series D, Kluwer, 1991.
 [4] Z. Pawlak, A. Skowron, Rudiments of rough sets, Inf. Sci. 177 (2007) 3–27.
 [5] D. Angluin, Queries and concept learning, Mach. Learn. 2 (1988) 319–342.
 [6] D. Angluin, Queries revisited, Theor. Comput. Sci. 313 (2004) 175–194.
 [7] D. Angluin, Learning regular sets from queries and counterexamples, Inf. Comput. 75
     (1987) 87–106.
 [8] L. G. Valiant, A theory of the learnable, Commun. ACM 27 (1984) 1134–1142.
 [9] M. Azad, I. Chikalov, S. Hussain, M. Moshkov, Minimizing depth of decision trees with
     hypotheses (to appear), in: International Joint Conference on Rough Sets (IJCRS 2021),
     19–24 September 2021, Bratislava, Slovakia, 2021.
[10] M. Azad, I. Chikalov, S. Hussain, M. Moshkov, Minimizing number of nodes in decision
     trees with hypotheses (to appear), in: 25th International Conference on Knowledge-Based
     and Intelligent Information & Engineering Systems (KES 2021), 8–10 September 2021,
     Szczecin, Poland, 2021.
[11] M. Azad, I. Chikalov, S. Hussain, M. Moshkov, Optimization of decision trees with hy-
     potheses for knowledge representation, Electronics 10 (2021) 1580. URL: https://doi.org/10.
     3390/electronics10131580.
[12] H. AbouEisha, T. Amin, I. Chikalov, S. Hussain, M. Moshkov, Extensions of Dynamic
     Programming for Combinatorial Optimization and Data Mining, volume 146 of Intelligent
     Systems Reference Library, Springer, 2019.
[13] F. Alsolami, M. Azad, I. Chikalov, M. Moshkov, Decision and Inhibitory Trees and Rules for
     Decision Tables with Many-valued Decisions, volume 156 of Intelligent Systems Reference
     Library, Springer, 2020.
[14] M. Azad, I. Chikalov, S. Hussain, M. Moshkov, Entropy-based greedy algorithm for decision
     trees using hypotheses, Entropy 23 (2021) 808. URL: https://doi.org/10.3390/e23070808.