Sorting by Decision Trees with Hypotheses (extended abstract) Mohammad Azad1 , Igor Chikalov2 , Shahid Hussain3 and Mikhail Moshkov4 1 Jouf University, Sakaka 72441, Saudi Arabia 2 Intel Corporation, 5000 W Chandler Blvd, Chandler, AZ 85226, USA 3 Institute of Business Administration, University Road, Karachi 75270, Pakistan 4 King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia Abstract In this paper, we consider decision trees that use both queries based on one attribute each and queries based on hypotheses about values of all attributes. Such decision trees are similar to ones studied in exact learning, where not only membership but also equivalence queries are allowed. For 𝑛 = 3, . . . , 6, we compare decision trees based on various combinations of attributes and hypotheses for sorting 𝑛 pairwise different elements from linearly ordered set. Keywords decision tree, hypothesis, dynamic programming, sorting 1. Introduction Decision trees are widely used in many areas of computer science, for example, test theory (initiated by Chegis and Yablonskii [1]), rough set theory (initiated by Pawlak [2, 3, 4]), and exact learning (initiated by Angluin [5, 6]). These theories are closely related: attributes from rough set theory and test theory correspond to membership queries from exact learning. Exact learning also studies equivalence queries. The notion of β€œminimally adequate teacher” using both membership and equivalence queries was discussed by Angluin in [7]. Relations between exact learning and PAC learning proposed by Valiant [8] were considered in [5]. In [9, 10, 11], we added the notion of a hypothesis (an analog of equivalence queries) to the model considered in both rough set theory and test theory and proposed dynamic programming algorithms for the optimization of the decision trees with hypotheses. Note that the dynamic programming algorithms for the optimization of the conventional decision trees that do not use hypotheses were proposed earlier [12]. In the present paper, we consider an application of the dynamic programming algorithms from [9, 10, 11] to the study of the problem of sorting. We compare the complexity of five types of optimal (relative to the depth and relative to the number of realizable nodes) decision trees 29th International Workshop on Concurrency, Specification and Programming (CS&P’21) " mmazad@ju.edu.sa (M. Azad); igor.chikalov@gmail.com (I. Chikalov); shahidhussain@iba.edu.pk (S. Hussain); mikhail.moshkov@kaust.edu.sa (M. Moshkov)  0000-0001-9851-1420 (M. Azad); 0000-0002-1010-6605 (I. Chikalov); 0000-0002-1698-2809 (S. Hussain); 0000-0003-0085-9483 (M. Moshkov) Β© 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) based on various combinations of attributes and hypotheses for sorting 𝑛 pairwise different elements from linearly ordered set, 𝑛 = 3, . . . , 6. Results obtained for the conventional decision trees are known – see book [12]. Results obtained for the decision trees with hypotheses are completely new. Note that in the present paper we follow [11] when discuss the notions related to the decision trees with hypotheses. Complete definitions of these notions can be found in the same paper. 2. Five Types of Decision Trees and Their Optimization Let 𝑇 be a decision table with 𝑛 conditional attributes 𝑓1 , . . . , 𝑓𝑛 that have values from the set πœ” = {0, 1, 2, . . .}. Rows of this table are pairwise different and each row is labeled with a decision. For a given row of 𝑇 , we should recognize the decision attached to it. To this end, we will use decision trees based on two types of queries. We can ask about the value of a conditional attribute 𝑓𝑖 ∈ {𝑓1 , . . . , 𝑓𝑛 } on the given row. As a result, we obtain an answer of the kind 𝑓𝑖 = 𝛿, where 𝛿 is the number in the intersection of the given row and the column 𝑓𝑖 . We can also ask if a hypothesis {𝑓1 = 𝛿1 , . . . , 𝑓𝑛 = 𝛿𝑛 } is true, where the numbers 𝛿1 , . . . , 𝛿𝑛 belong to the columns 𝑓1 , . . . , 𝑓𝑛 , respectively. Either this hypothesis is confirmed or we obtain a counterexample of the kind 𝑓𝑖 = 𝜎, where 𝑓𝑖 ∈ {𝑓1 , . . . , 𝑓𝑛 } and 𝜎 is a number from the column 𝑓𝑖 that is different from 𝛿𝑖 . We will say that this hypothesis is proper if (𝛿1 , . . . , 𝛿𝑛 ) is a row of the table 𝑇 . We study the following five types of decision trees: 1. Decision trees based on attributes only. 2. Decision trees based on hypotheses only. 3. Decision trees based on both attributes and hypotheses. 4. Decision trees based on proper hypotheses only. 5. Decision trees based on both attributes and proper hypotheses. As time complexity of a decision tree we consider its depth, which is equal to the maximum number of queries in a path from the root to a terminal node of the tree. We consider the number of realizable relative to 𝑇 nodes in a decision tree as its space complexity. A node is called realizable relative to 𝑇 if the computation in the tree will pass through this node for some row and some choice of counterexamples. We use the following notation: β€’ β„Ž(π‘˜) (𝑇 ) denotes the minimum depth of a decision tree of the type π‘˜ for 𝑇 , π‘˜ = 1, . . . , 5. β€’ 𝐿(π‘˜) (𝑇 ) denotes the minimum number of nodes realizable relative to 𝑇 in a decision tree of the type π‘˜ for 𝑇 , π‘˜ = 1, . . . , 5. In [9] and [10], dynamic programming algorithms for the optimization of decision trees of all five types relative to the depth and the number of realizable nodes were proposed (see also journal extension [11] of these papers that considers additionally two cost functions: the number of realizable terminal nodes and the number of nonterminal nodes). Note that algorithms for the minimization of the depth and number of nodes for decision trees of the type 1 were considered in [12] for decision tables with one-valued decisions and in [13] for decision tables with many-valued decisions. Table 1 Experimental results for the depth 𝑛 β„Ž(1) (𝑇𝑛 ) β„Ž(2) (𝑇𝑛 ) β„Ž(3) (𝑇𝑛 ) β„Ž(4) (𝑇𝑛 ) β„Ž(5) (𝑇𝑛 ) 3 3 2 2 2 2 4 5 4 4 4 4 5 7 6 6 6 6 6 10 9 9 9 9 Table 2 Experimental results for the number of realizable nodes 𝑛 𝐿(1) (𝑇𝑛 ) 𝐿(2) (𝑇𝑛 ) 𝐿(3) (𝑇𝑛 ) 𝐿(4) (𝑇𝑛 ) 𝐿(5) (𝑇𝑛 ) 3 11 13 9 14 9 4 47 253 39 254 39 5 239 15,071 199 15,142 199 6 1,439 2,885,086 1,199 2,886,752 1,199 Dynamic programming optimization algorithms are applicable to medium-sized decision tables. These algorithms first construct a directed acyclic graph (DAG) whose nodes are some subtables of the original decision table given by conditions of the type β€œattribute = value”. Then they pass through all the nodes of the DAG, starting with the simplest subtables, and for each subtable they find the minimum value of the considered cost function. In the present paper, we use algorithms proposed in [9, 10, 11] to study decision trees of all five types optimal relative to the depth and relative to the number of realizable nodes for the sorting problem. Results for decision trees of the type 1 were obtained earlier [12]. Results for decision trees of the types 2–5 are new. 3. Problem of Sorting In this paper, we study the problem of sorting 𝑛 elements. Let π‘₯1 , . . . , π‘₯𝑛 be pairwise different elements from a linearly ordered set. We should find a permutation (𝑝1 , . . . , 𝑝𝑛 ) from the set 𝑃𝑛 of all permutations of the set {1, . . . , 𝑛} for which π‘₯𝑝1 < Β· Β· Β· < π‘₯𝑝𝑛 . To this end, we use attributes π‘₯𝑖 : π‘₯𝑗 such that 𝑖, 𝑗 ∈ {1, . . . , 𝑛}, 𝑖 < 𝑗, π‘₯𝑖 : π‘₯𝑗 = 1 if π‘₯𝑖 < π‘₯𝑗 , and π‘₯𝑖 : π‘₯𝑗 = 0 if π‘₯𝑖 > π‘₯𝑗 . The problem of sorting 𝑛 elements can be represented as a decision table 𝑇𝑛 with 𝑛(𝑛 βˆ’ 1)/2 conditional attributes π‘₯𝑖 : π‘₯𝑗 , 𝑖, 𝑗 ∈ {1, . . . , 𝑛}, 𝑖 < 𝑗, and 𝑛! rows corresponding to permutations from 𝑃𝑛 . For each permutation (𝑝1 , . . . , 𝑝𝑛 ), the corresponding row of 𝑇𝑛 is labeled with this permutation as the decision. This row is filled with values of attributes π‘₯𝑖 : π‘₯𝑗 such that π‘₯𝑖 : π‘₯𝑗 = 1 if and only if 𝑖 stays before 𝑗 in the tuple (𝑝1 , . . . , 𝑝𝑛 ). For 𝑛 = 3, . . . , 6 and π‘˜ = 1, . . . , 5, we find values of β„Ž(π‘˜) (𝑇𝑛 ) and 𝐿(π‘˜) (𝑇𝑛 ) using dynamic programming algorithms described in [9, 10, 11] – see results in Tables 1 and 2. From the obtained experimental results it follows that the decision trees of the types 2–5 can have less depth than the decision trees of the type 1. Decision trees of the types 3 and 5 can have less number of realizable nodes than the decision trees of the type 1. Decision trees of the types 2 and 4 have too many nodes. 4. Conclusions In this paper, we found the minimum depth and the minimum number of realizable nodes of five types of decision trees for sorting 𝑛 elements, 𝑛 = 3, . . . , 6. In the future, we are planning to study joint behavior of the depth and the number of nodes in such decision trees. It would be also interesting to compare the complexity of optimal decision trees of the considered five types constructed by dynamic programming algorithms and the complexity of decision trees constructed by entropy-based greedy algorithm proposed in [14]. Acknowledgments Research reported in this publication was supported by King Abdullah University of Science and Technology (KAUST). The authors are indebted to the anonymous reviewers for interesting comments. References [1] I. A. Chegis, S. V. Yablonskii, Logical methods of control of work of electric schemes, Trudy Mat. Inst. Steklov (in Russian) 51 (1958) 270–360. [2] Z. Pawlak, Rough sets, Int. J. Parallel Program. 11 (1982) 341–356. [3] Z. Pawlak, Rough Sets - Theoretical Aspects of Reasoning about Data, volume 9 of Theory and Decision Library: Series D, Kluwer, 1991. [4] Z. Pawlak, A. Skowron, Rudiments of rough sets, Inf. Sci. 177 (2007) 3–27. [5] D. Angluin, Queries and concept learning, Mach. Learn. 2 (1988) 319–342. [6] D. Angluin, Queries revisited, Theor. Comput. Sci. 313 (2004) 175–194. [7] D. Angluin, Learning regular sets from queries and counterexamples, Inf. Comput. 75 (1987) 87–106. [8] L. G. Valiant, A theory of the learnable, Commun. ACM 27 (1984) 1134–1142. [9] M. Azad, I. Chikalov, S. Hussain, M. Moshkov, Minimizing depth of decision trees with hypotheses (to appear), in: International Joint Conference on Rough Sets (IJCRS 2021), 19–24 September 2021, Bratislava, Slovakia, 2021. [10] M. Azad, I. Chikalov, S. Hussain, M. Moshkov, Minimizing number of nodes in decision trees with hypotheses (to appear), in: 25th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems (KES 2021), 8–10 September 2021, Szczecin, Poland, 2021. [11] M. Azad, I. Chikalov, S. Hussain, M. Moshkov, Optimization of decision trees with hy- potheses for knowledge representation, Electronics 10 (2021) 1580. URL: https://doi.org/10. 3390/electronics10131580. [12] H. AbouEisha, T. Amin, I. Chikalov, S. Hussain, M. Moshkov, Extensions of Dynamic Programming for Combinatorial Optimization and Data Mining, volume 146 of Intelligent Systems Reference Library, Springer, 2019. [13] F. Alsolami, M. Azad, I. Chikalov, M. Moshkov, Decision and Inhibitory Trees and Rules for Decision Tables with Many-valued Decisions, volume 156 of Intelligent Systems Reference Library, Springer, 2020. [14] M. Azad, I. Chikalov, S. Hussain, M. Moshkov, Entropy-based greedy algorithm for decision trees using hypotheses, Entropy 23 (2021) 808. URL: https://doi.org/10.3390/e23070808.