=Paper= {{Paper |id=Vol-2500/paper_6 |storemode=property |title=The Quantum Version Of Classification Decision Tree Constructing Algorithm C5.0 |pdfUrl=https://ceur-ws.org/Vol-2500/paper_6.pdf |volume=Vol-2500 |authors=Kamil Khadiev,Ilnaz Mannapov,Liliya Safina }} ==The Quantum Version Of Classification Decision Tree Constructing Algorithm C5.0== https://ceur-ws.org/Vol-2500/paper_6.pdf
    The Quantum Version Of Classification Decision Tree
              Constructing Algorithm C5.0

                                           Kamil Khadiev
             Kazan Federal University, 18, Kremlyovskaya st, Kazan, Russia, 420008;
            Zavoisky Physical-Technical Institute, FRC Kazan Scientific Center of RAS
                           10/7, Sibirsky tract, Kazan, Russia, 420029
                                       kamilhadi@gmail.com
                                          Ilnaz Mannapov
                                    Kazan Federal University
                          18, Kremlyovskaya st, Kazan, Russia, 420008
                                      ilnaztatar5@gmail.com
                                            Liliya Safina
                                    Kazan Federal University
                          18, Kremlyovskaya st, Kazan, Russia, 420008
                                     liliasafina94@gmail.com




                                                         Abstract
                       In the paper, we focus on complexity of C5.0 algorithm for constructing
                       decision tree classifier that is the models for the classification problem
                       from machine learning. In classical case the decision tree is constructed
                       in O(hd(N M +N log N )) running time, where M is a number of classes,
                       N is the size of a training data set, d is a number of attributes of each
                       element, h is a tree height. Firstly, we improved the classical version,
                       the running time of the new version is O(h · d · N log N ). Secondly, we
                       suggest a quantum version of this algorithm, which uses quantum sub-
                       routines like the amplitude amplification and the Dürr-Høyer minimum
                       search algorithms that are based on Grover’s
                                                                √       algorithm. The running
                       time of the quantum algorithm is O h · d log d · N log N that is better
                       than complexity of the classical algorithm.




1    Introduction
Quantum computing [30, 6] is one of the hot topics in computer science of last decades. There are many
problems where quantum algorithms outperform the best known classical algorithms [10, 19, 25, 24]. Superior of
quantum over classical was shown for different computing models like query model, streaming processing models,
communication models and others [29, 3, 2, 23, 20, 18, 22, 21, 29, 4]. Today quantum computing is often used in
machine learning to speed up construction of machine learning models or to predict a result for new input data

Copyright 2019 for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
In: S. Hölldobler, A. Malikov (eds.): Proceedings of the YSIP-3 Workshop, Stavropol and Arkhyz, Russian Federation,
17-09-2019–20-09-2019, published at http://ceur-ws.org




                                                               1
[7, 27, 37, 36]. Sometimes the learning process (construction) of a machine learning model takes a long time
because of the large size of data. Even a small reduction in running time can provide a significant temporary
benefit to the program.
   Decision trees are often used to build a classifier. Random forest [16], Gradient tree boosting [13] models
are very popular and effective for solving classification and regression problems. These algorithms are based on
decision trees. There are several algorithms for trees construction that are CART [28], ID3 [32], C4.5 [34], C5.0
[35] and others. We consider C5.0 [1] algorithm for decision tree classifiers. It works in O(hd(N M + N log N ))
running time, where h is a height of a tree, N is the size of a training set, d is a number of attributes for one
vector from the training set, and M is a number of classes.
   In this paper, firstly, we present an improved version of the classical algorithm that uses Self-balancing binary
search tree [9] and has O(hdN log N ) running time. As a self-balancing binary search tree we can use the AVL
tree [5, 9] or the Red-Black tree [15, 9]. Secondly, we describe
                                                            √ a quantum    version of the C5.0 algorithm. We call it
QC5.0. The running time of QC5.0 is equal to O h log d dN log N . The algorithm is based on generalizations
of Grover’s Search algorithm [14] that are amplitude amplification [8] and Dürr-Høyer algorithm for minimum
search [11].
   The paper has the following structure. Section 2 contains preliminaries. Description of the classical version
C4.5 and C5.0 algorithms are in Section 3. Section 4 contains improvements of classical algorithm. We provide
the quantum algorithm in Section 5.


2    Preliminaries
Machine learning [12, 32] allows us to predict a result using information about past events. C5.0 algorithm is
used to construct a decision tree for classification problem [26]. Let us consider a classification problem in formal
way.
   There are two sequences: X = {X 1 , X 2 , ..., X N } is a training data set and Y = {y1 , y2 , ..., yN } is a set of
corresponding classes. Here X i = {xi1 , xi2 , ..., xid } is a vector of attributes, where i ∈ {1, . . . , N }, d is a number
of attributes, N is a number of vectors in the training data set, yi ∈ C = {1, . . . , M } is a number of class of
X i vector. An attribute xij is a real-valued variable or a discrete-valued variable, i.e. xij ∈ {1, . . . , Tj } for some
integer Tj . Let DOMj = R if xj is a real value; and DOMj = {1, . . . , Tj } if xj is a discrete-valued attribute.
The problem is to construct a function F : DOM1 × . . . × DOMd → C that is called classifier. The function
classifies a new vector X = (x1 , . . . , xd ) that is not from X .
   There are many algorithms to construct a classifier. Decision tree and the algorithm C5.0 for constructing a
decision tree are a central subject of this work.
   A decision tree is a tree such that each node tests some condition on input variables. Suppose B is some
test with outcomes b1 , b2 , . . . , bt that is tested in a node. Then, there are t outgoing edges for the node for each
outcome. Each leaf is associated with a result class from C. The testing process is the following. We start test
conditions from the root node and go by edges according to a result of the condition. The label on the reached
leaf is the result of the classification process.
   Our algorithm uses some quantum algorithms as a subroutine, and the rest part is classical. As quantum
algorithms, we use query model algorithms. These algorithms can do a query to a black box that has access to
the training data set and stored data. As a running time of an algorithm, we mean a number of queries to the
black box. In a classical case, we use the classical analog of the computational model that is query model. We
suggest [30] as a good book on quantum computing and [6] for a description of the query model.


3    The Observation of C4.5 and C5.0 Algorithms
We consider a classifier F that is expressed by decision trees. This section is dedicated to the C5.0 algorithm
for decision trees construction for the classification problem. This algorithm is the improved version of the
algorithm C4.5, and it is the part of the commercial system See5/C5.0. C4.5 and C5.0 algorithms are proposed
by Ross Quinlan [26]. Let us discuss these algorithms. C4.5 belongs to a succession of decision tree learners that
trace their origins back to the work of Hunt and others in the late 1950s and early 1960s [17]. Its immediate
predecessors were ID3 [31], a simple system consisting initially of about 600 lines of Pascal, and C4 [33].




                                                             2
3.1   The Structure of the Tree
Decision tree learners use a method known as divide and conquer to construct a suitable tree from a training set
X of vectors:
  • If all vectors in X belong to the same class c ∈ C, then the decision tree is a leaf labeled by c.
  • Otherwise, let B be some test with outcomes b1 , b2 , . . . , bt that produces a non-trivial partition of X . Let
    Xi be the set of training vectors from X that has outcome bi of B. Then, the tree is presented in Figure 1.
    Here Ti is a result of growing a decision tree for a set Xi .




   Figure 1: Testing B with outcomes b1 , b2 , . . . , bt . Here Ti is a result of growing a decision tree for a set Xi

  C4.5 uses tests of three types, each of them involves only a single attribute Aa . Decision regions in the instance
space are thus bounded by hyperplanes, each of them is orthogonal to one of the attribute axes.

  • If xj is a discrete-valued attribute from {1, . . . , Tj }, then possible tests are
       – xj =? with Tj outcomes, one for each value from {1, . . . , Tj }. (This is the default test.)
       – xj ∈ G where G ⊂ {1, . . . , Tj }. Tests of this kind are found by a greedy search that maximizes the
         value of the splitting criterion (It is discussed below).
  • If xj is a real-valued attribute, then a test is “xj ≤ θ” with two outcomes that are “true” and “false”. Here
    θ is a constant threshold. Possible values of θ are found by sorting the distinct values for {x1j , . . . , xN
                                                                                                                 j } set.
    Possible thresholds are values between each pair of adjacent values in the sorted sequence. So, if the training
    vectors from X have d distinct values for j-th attribute, then d − 1 thresholds are considered.

3.2   Test Selection Procedure
C4.5 relies on a greedy search, selecting a candidate test that maximizes a heuristic splitting criterion.
   Two criteria are used in C4.5 that are information gain, and gain ratio. Let Cj = {i : i ∈ {1, . . . , |X |}, yi = j}
be a set of indexes of training vectors from X that belong to j-th class, for j ∈ C = {1, . . . , M }. Let RF (j; X )
                                                                                           |C |
be a relative frequency of training vectors in X with indexes from Cj . RF (j; X ) = |Xj| The information content
                                                                              PM
of a message that identifies the class of vectors from X is I(X ) = − j=1 RF (j, X ) log (RF (j, X )) After that
we split X into subsets X1 , X2 , . . . , Xt with respect to a test B, the information gain is
   G(X , B) = I(X ) − i=1 |X    i|
                       Pt
                              |X| I (X  
                                          i ) . The potential information from the partition itself is
                  Pt |Xi |         |Xi |
   P (X , B) = − i=1 |X | log |X | . The test B is chosen such that it maximizes the gain ratio that is G(X           ;B)
                                                                                                                 P (X ;B) .


3.3   Notes on C5.0 algorithm
C4.5 was superseded in 1997 by a commercial system See5/C5.0 (or C5.0 for short). The changes encompass new
capabilities as well as much-improved efficiency, and include the following items. (1) A variant of boosting [33],
which constructs an ensemble of classifiers that are later used to give a final classification. Boosting often leads
to a dramatic improvement in predictive accuracy. (2) New data types (e.g., dates), “not applicable” values,
variable misclassification costs, and mechanisms for a pre-filtering of attributes. (3) An unordered rule sets that
it is a situation when a vector is classified, all applicable rules are found and voted. This fact improves both
the interpretability of rule sets and their predictive accuracy. (4) Greatly improved scalability of both decision
trees and (particularly) rule sets (sets of if-then rules, representation of decision tree). Scalability is enhanced
by multi-threading; C5.0 can take advantage of computers with multiple CPUs and/or cores.




                                                            3
   More details are available in [1] ( http://rulequest.com/see5-comparison.html). At the same time, the process
of choosing a test B was not changed significantly. In this work, we focus on this process and improve its
complexity.

3.4    Running Time of the One-threading Tree Constructing Part of C4.5 and C5.0 algorithms
Let us remind the parameters of the model. N is a number of vectors in a training set, M is a number of classes,
d is a number of attributes (elements of vectors from the training set). Let the height of a constructing tree h
be a parameter of the algorithm. Let RV be a set of indexes of real-valued attributes and let DV be a set of
indexes of discrete-valued attributes.
   Let us describe the procedure step by step because we will use it for our improvements. Assume that we
construct a binary tree of height h.
   The main procedure is ConstructClassifiers that invoke a recursive procedure FormTree for construct-
ing nodes. The main parameters of FormTree are level that is an index of tree level; tree that is a result
subtree that the procedure will construct; X 0 that is a set that we use for constructing this subtree.
   Let us present ConstructClassifiers and FormTree procedures as Algorithm FormTree. The
FormTree procedure does two steps. The first one ChooseSplit is choosing the test B that is the choosing
                                                                                                  0
an attribute and the splitting by this attribute that maximize the objective function G(X           ;B)
                                                                                             P (X 0 ;B) . The result at-
tribute index is attr and the result split is split variable. The second step Divide is the splitting processes itself.




   Let us describe the ChooseSplit procedure that provides the best attribute index and split itself. It is
presented in Algorithm ChooseSplit. The procedure considers each attribute, and it has two different kinds of
processing processes of the attribute depending on belonging to RV or DV . Let us describe the procedure for a
real-valued attribute attr. We have three steps.
   The first step is a sorting X 0 by xattr element of vectors. This procedure is Sort(X 0 , j). Assume that the result
indexes in a sorted order are (i1 , . . . , iz ), where z = |X 0 |. So, now we can split vectors X 0 by θu = (xuattr +xu+1 attr )/2
and then there will be two sets X1 = {X i1 , . . . , X iu } and X2 = {X iu+1 , . . . , X iz }, for u ∈ {1, . . . , z − 1}.
   The second step is computing pCj [u] = |Cju |, pI[u] = I({X i1 , . . . , X iu }) and pbI[u] = I({X iu , . . . , X iz }),
where u ∈ {1, . . . , z}, Cju = {w : w ∈ {1, . . . , u}, yiw = j}, j ∈ {1, . . . , M }. We use the following formula for the
values:
   pCj [u] = pCj [u − 1]+ 1 if yiu = j; and pCj [u] = pCj [u − 1] otherwise.
                                                                           
                              pC [u−1]      pC [u−1]    pC [u]           pC [u]
   pI[u] = pI[u − 1] − − jN        log jN        + Nj log Nj             , if yiu = j.
                                                                                                      
                           pCj [z]−pCj [u]     pCj [z]−pCj [u]   pCj [z]−pCj [u−1]     pC [z]−pC [u−1]
   pbI[u] = pbI[u + 1] − −        N        log        N        +         N         log j N j             , if yiu = j.
                                                                 0
                                                                G(X ,u)
   The third step is choosing maximum                max           0    , where G(X 0 , u) = I(X 0 )− N  u
                                                                                                           ·pI[u]− NN−u ·(pbI[u+1])
                                              u∈{1,...,z−1} P (X ,u)
                                    N −u
and P (X 0 , u) = − N    u       u
                                    − N · log NN−u . We use these formulas because they correspond to splitting
                                                             
                           · log N
X1 = {X i1 , . . . , X iu } and X2 = {X iu+1 , . . . , X iz }, I(X1 ) = pI[u], I(X2 ) = pbI[u + 1] and I(X 0 ) = pI[z].
   If we process a discrete-valued attribute from DV , then we can compute the value of the object function
when we split all elements of X 0 according value of the attribute. So Xw = {i : X i ∈ X 0 , xiattr = w}, for
w ∈ {1, . . . , Tattr }.
   Let us describe the processing of discrete-valued attributes. The first step is computing the case numbers of
classes before split, the case numbers of classes after split, the case numbers for t values of current attribute.
   The second step is calculating an entropy I(X ) before split. The third step is calculating the entropies
I (Xi ) after split to t branches, information gain G(X , B) = I(X ) − i=1 |X                 i|
                                                                                   Pt
                                                                                            |X | I (Xi ) and potential information




                                                                     4
                                           
                Pt   |Xi |          |Xi |
P (X , B) = −    i=1 |X | log       |X |        . The last step is calculating a gain ratio G(X  ;B)
                                                                                            P (X ;B) .


   Let us describe the ChooseSplit procedure that splits the set of vectors.     The procedure
also described in Algorithm ChooseSplit and Divide.   The Divide procedure recursively invokes
the FormTree procedure for each set from sequence of sets split for constructing child subtrees.




                                                                        5
  Let us discuss the running time of the algorithm.

Theorem 1 The running time of C5.0 is O(h · d · (M · N + N log N )).

Proof      The procedure FormTree has two main subroutines: ChooseSplit and Divide. The procedure
Divide recursively invokes the FormTree procedure. In fact, ChooseSplit takes the main time for each node
of the tree. That is why we focus on analyzing this procedure.
   Let us consider a real-valued attribute. The running time for computing of pI, pbI and pC is O(|X 0 |). The
running time for sorting procedure is O(|X 0 | log |X 0 |). The running time of computing a maximum of gain ratios
for different splits is O(|X 0 |). Additionally, we should initialize pC array that takes O(M ). The total complexity
of this processing a real-valued attribute in ProcessAttribute procedure is O(M + |X 0 | log |X 0 |).




                                                         6
   Let us consider a discrete-valued attribute. The cases processing time complexity is O(|X 0 |). An information
gain G(X , B) for some discrete attribute B is calculated with O(M · t) running time, where t is a number of
attribute values, M is a number of classes. An entropy before cutting I(X 0 ) is calculated with O(M ) running
time, an entropy after cutting is calculated in O(M · t). The potential information P (X , B) is calculated with
O(t) running time. The gain ratio is calculated with O(1) running time. Therefore the running time of processing
of one discrete-valued attribute in ProcessAttribute procedure is O(|X 0 | + M · t).
   Note that if we consider all X 0 sets of one level of the decision tree, then we collect all elements of X . The
                  P k                                                               
running time is O       M + |Xi0 | log |Xi0 | ≤ O kM + N log N ≤ O N · M + N log N , because k is a number
                     i=1
                                              k
                                             P                            k
                                                                            P                        k        
                                                    |Xi0 | log |Xi0 |          |Xi0 | log N                |Xi0 |
                                                                                                         P
of nodes on one level and k ≤ N , O                                     ≤ O                    = O log N             ≤
                                              i=1                           i=1                               i=1
                                                                            k
                                                                           P                        
                                                                                  |Xi0 | + M · t
            
O N log N . The running time for discrete-valued attributes is O                ≤ O N +M ·t·k ≤
                                                               i=1
O N + M · N . Therefore, the total complexity for one level is O(d · (M · N + N log N )), and the total
complexity for the whole tree is O(hd · (M · N + N log N ))                                                                



4     Improvement of the Classical C4.5/C5.0 algorithms
4.1   Improvement of Discrete-valued Attributes Processing
If we process a discrete-valued attribute from DV , then we can compute the value of the object function when we
split all elements of X 0 according value of the attribute. So Xw = {i : X i ∈ X 0 , xiattr = w}, for w ∈ {1, . . . , Tattr }.
    We will process all vectors of X 0 one by one. Let us consider processing of current u-th vector X iu such that
y = j and xiattr
  iu             u
                     = w. Let us compute the following variables: Nw is a number of elements of Xw ; Cj is a
number of vectors from X 0 that belongs to the class j; Cj,w is a number of vectors from Xw that belongs to the
class j; P is a potential information; Iw is I(Xw ); I is information of X 0 ; S = G(X 0 , B) − I(X). Assume that
these variables contains values after processing u-th vector and Nw0 , Cj0 , Cj,w
                                                                                0
                                                                                     , P 0 , Iw0 , I 0 and S 0 contains values
before processing u-th vector. The final values of the variables will be after processing all z = |X 0 | variables. We
will recompute each variable according to the formulas from Table 1 (only variables that depends on j and w
are changed)

                                           Table 1: Updating formulas
                                    N0     N0                                     C0     C0
                                                                                                       
                                                                                               C      C
 Nw ← Nw0 − 1          P ← P 0 − − zw log zw + Nzw log Nzw          I ← I 0 − − zj log zj + zj log zj
                                    C0       C0                                      0       0     0       0
                                                                                                          
                                                    C        Cj,w                  Nw      Nw    Nw      Nw
 Cj ← Cj + 1           Iw ← Iw0 − − Nj,w
                                       0 log Nj,w
                                                0 + Nj,w
                                                      w
                                                         log Nw     S ← S  0
                                                                             −   −  z  log  z  +  z  log  z
                                         w          w
         0
 Cj,w ← Cj,w +1

    So, finally we obtain the new ProcessDiscrete procedure.




                                                              7
4.2   Using a Self-balancing Binary Search Tree
We suggest the following improvement. Let us use a self-balancing binary search tree [9] data structure for pCj [u]
and Cj,w . As a self-balancing binary search tree we can use the AVL tree [5, 9] or the Red-Black tree [15, 9].
This data structure can be used for implementation of mapping from set of indexes to set of values. We always
mean that the data structure contains only indexes with a non-zero value, and other values are zero. We use
indexes of non-zero elements as key for constructing the search tree and values as additional data that is stored
in a corresponding node of the tree. In the paper we call this data structure as Tree Map. The data structure
has three properties on running time. (i) Running time of adding, removing and inserting a new index (that
is called key) to the data structure is O(log s), where s is a number of keys in the tree or a number of indexes
with non-zero values. (ii) Running time of finding a value by index and modification of the value is O(log s)
(iii)Running time of removing all indexes from the data structure and checking all indexes of data structure is
O(s), where s is a number of indexes with non-zero values.
    If we use Tree Map, then we can provide the following running time.
Lemma 1 The running time of C5.0 that uses Tree Map (Self-balancing binary search tree) is O(h·d·N log N )).
Proof      The proof is similar to the proof of Theorem 1, with the following exceptions. If we do not need to ini-
tialize the pCj [u] and Cj,w , but erase these values after processing an attribute, then this procedure takes O(|X 0 |)
steps. So, the running time for processing a real-valued attribute becomes O(N log N + N log N ) = O(N log N ),
and for a discrete-valued attribute, it is O(N log N ) because we process each vector one by one and recompute
variables that takes only O(log N ) steps for updating values of Cj,w and O(1) steps for other actions. Therefore,
the total complexity is O(hd · N log N ).                                                                            


5     Quantum C5.0
The key idea of the improved version of C5.0 algorithm is using the Dürr and Høyer’s algorithm for maximum
search and Amplitude Amplification algorithm. These two algorithms in combination has the following property:

Lemma 2 Suppose, we have a function f : {1, . . . , K} → R such that the running time of computing f (x) is
T (K). Then, there is √
                      a quantum algorithm that finds argument x0 of maximal f (x0 ), the expected running time
of the algorithm is O( K · T (K)) and the success probability is at least 12 .

Proof    For prove we can use Dürr and Høyer’s for minimum search [11] with replacing Grover’s Search
algorithm by Amplitude amplification version for computing f (x) from [8].                          

   Using this Lemma we can replace the maximum search by attribute in ChooseSplit function and use
ProcessAttribute as function f . Let us call the function QChooseSplit. Additionally, for reducing an error
probability, we can repeat the maximum finding process log d times and choose the best solution. The procedure
is presented in Algorithm QChooseSplit.




                                                                         √              
Theorem 2 The running time        of the Quantum C5.0 algorithm is O (h dN log N ) log d . The success proba-
bility of QC5.0 is O (1 − d1 )k , where k is a number of inner nodes (non leaf ).
                               




                                                           8
Proof      The √ running time of ProcessAttribute is O(|X 0 | log |X 0 |). So the running √         time of maximum
searching is O( d|X 0 | log |X 0 |). With repeating the algorithm,
                                                             √     the running time is O( d|X 0 | log |X 0 | log d). If we
sum the running time for all nodes, then we obtain O h dN log N ) log d . The success probability of the Dürr
and Høyer’s algorithm is 12 . We call it log d times and choose a maximum among log          d values 1of gain ratios.
                                                                                        1
Then, we find a correct attribute for one node with a success probability O 1 − 2log      d  = O 1 − d . We should
find correct attributes for all nodes except leaves. Thus, the success probability for the whole tree is equal to
O (1 − d1 )k , where k is a number of internal nodes (non leaf).                                                          


6    Conclusion
Firstly, we have suggested a version of the C4.5/C5.0 algorithm with Tree Map (Self-balancing binary search tree,
for example Read-Black tree or AVL tree) data structure. This version has a better running time. Secondly, we
have presented a quantum version of the C5.0 algorithm for classification problem. This algorithm demonstrates
almost quadratic speed-up with respect to a number of attributes.
   Acknowledgements
   The work is supported by Russian Scientific Found, project No. 19-19-00656.

References
 [1] C5.0: An informal tutorial, 2019. url=https://www.rulequest.com/see5-unix.html.

 [2] F. Ablayev, M. Ablayev, K. Khadiev, and A. Vasiliev. Classical and quantum computations with restricted
     memory. LNCS, 11011:129–155, 2018.

 [3] F. Ablayev, A. Ambainis, K. Khadiev, and A. Khadieva. Lower bounds and hierarchies for quantum
     memoryless communication protocols and quantum ordered binary decision diagrams with repeated test. In
     SOFSEM, LNCS, 10706:197–211, 2018.

 [4] F. Ablayev, A. Gainutdinova, K. Khadiev, and A. Yakaryılmaz. Very narrow quantum OBDDs and width
     hierarchies for classical OBDDs. Lobachevskii Journal of Mathematics, 37(6):670–682, 2016.

 [5] George M Adel’son-Vel’skii and Evgenii Mikhailovich Landis. An algorithm for organization of information.
     In Doklady Akademii Nauk, volume 146, pages 263–266. Russian Academy of Sciences, 1962.

 [6] A. Ambainis. Understanding quantum algorithms via query complexity. arXiv:1712.06349, 2017.

 [7] S. Arunachalam and R. de Wolf. Guest column: a survey of quantum learning theory. ACM SIGACT News,
     48(2):41–67, 2017.

 [8] G. Brassard, P. Høyer, M. Mosca, and A. Tapp. Quantum amplitude amplification and estimation. Con-
     temporary Mathematics, 305:53–74, 2002.

 [9] T. H Cormen, C. E Leiserson, R. L Rivest, and C. Stein. Introduction to Algorithms. McGraw-Hill, 2001.

[10] Ronald De Wolf. Quantum computing and communication complexity. 2001.

[11] C. Dürr and P. Høyer. A quantum algorithm for finding the minimum. arXiv:quant-ph/9607014, 1996.

[12] Alpaydin Ethem. Introduction to machine learning. 2010.

[13] J. H. Friedman. Greedy function approximation: A gradient boosting machine. 1999.

[14] Lov K Grover. A fast quantum mechanical algorithm for database search. In Proceedings of the twenty-eighth
     annual ACM symposium on Theory of computing, pages 212–219. ACM, 1996.

[15] L. J Guibas and R. Sedgewick. A dichromatic framework for balanced trees. In Proceedings of SFCS 1978,
     pages 8–21. IEEE, 1978.

[16] T. Hastie, R. Tibshirani, and J. H. Friedman. The Elements of Statistical Learning:Data Mining, Inference,
     and Prediction. Second Edition. 2009.




                                                            9
[17] EB Hunt. Concept learning: An information processing problem. 1962.
[18] R. Ibrahimov, K. Khadiev, K. Prūsis, and A. Yakarylmaz. Error-free affine, unitary, and probabilistic
     OBDDs. Lecture Notes in Computer Science, 10952 LNCS:175–187, 2018.
[19] Stephen Jordan. Bounded error quantum algorithms zoo. https://math.nist.gov/quantum/zoo.
[20] K. Khadiev and A. Khadieva. Reordering method and hierarchies for quantum and classical ordered binary
     decision diagrams. In CSR 2017, volume 10304 of LNCS, pages 162–175. Springer, 2017.

[21] K. Khadiev and A. Khadieva. Quantum online streaming algorithms with logarithmic memory. International
     Journal of Theoretical Physics, 2019.
[22] K. Khadiev and A. Khadieva. Two-way quantum and classical machines with small memory for online
     minimization problems. In International Conference on Micro- and Nano-Electronics 2018, volume 11022
     of Proc. SPIE, page 110222T, 2019.

[23] K. Khadiev, A. Khadieva, and I. Mannapov. Quantum online algorithms with respect to space and advice
     complexity. Lobachevskii Journal of Mathematics, 39(9):1210–1220, 2018.
[24] K. Khadiev, D. Kravchenko, and D. Serov. On the quantum and classical complexity of solving subtraction
     games. In Proceedings of CSR 2019, volume 11532 of LNCS, pages 228–236. 2019.

[25] K. Khadiev and L. Safina. Quantum algorithm for dynamic programming approach for dags. applications
     for zhegalkin polynomial evaluation and some problems on dags. In Proceedings of UCNC 2019, volume
     4362 of LNCS, pages 150–163. 2019.
[26] R. Kohavi and J. R. Quinlan. Data mining tasks and methods: Classification: decision-tree discovery.
     Handbook of data mining and knowledge discovery. Oxford University Press, 2002.

[27] Dawid Kopczyk. Quantum machine learning for data scientists. arXiv preprint arXiv:1804.10068, 2018.
[28] Breiman L., Friedman J. H., Olshen R. A., and Stone C. J. Classification and regression trees. 1984.
[29] François Le Gall. Exponential separation of quantum and classical online space complexity. Theory of
     Computing Systems, 45(2):188–202, 2009.

[30] M. A Nielsen and I. L Chuang. Quantum computation and quantum information. Cambridge univ. press,
     2010.
[31] J R. Quinlan. Discovering rules by induction from large collections of examples. Expert systems in the micro
     electronics age, 1979.

[32] J. R. Quinlan. Induction of decision trees. Machine learning, pages 81–106, 1986.
[33] J. R. Quinlan. Simplifying decision trees. International journal of man-machine studies, 27(3):221–234,
     1987.
[34] J. R. Quinlan. Improved use of continuous attributes in c4.5. Journal of Artificial Intelligence Research,
     pages 77–90, 1996.

[35] Pandya R. and Pandya J. C5. 0 algorithm to improved decision tree with feature selection and reduced
     error pruning. International Journal of Computer Applications., pages 18–21, 2015.
[36] M. Schuld, I. Sinayskiy, and F. Petruccione. The quest for a quantum neural network. Quantum Information
     Processing, 13(11):2567–2586, 2014.

[37] M. Schuld, I. Sinayskiy, and F. Petruccione. An introduction to quantum machine learning. Contemporary
     Physics, 56(2):172–185, 2015.




                                                       10