<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>KDID</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>From Inductive Databases to DL8.5</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Siegfried Nijssen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pierre Schaus</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ICTEAM</institution>
          ,
          <addr-line>UCLouvain, Place Sainte-Barbe 2, bte L5.02.01, B-1348 Louvain-la-Neuve</addr-line>
          ,
          <country country="BE">Belgium;</country>
          <institution>TRAIL Institute</institution>
          ,
          <country country="BE">Belgium</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>20</volume>
      <fpage>0000</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>A core idea of inductive databases was the creation of systems that can be queried declaratively for patterns and models. A natural question during the development of inductive databases was how such systems would work for one of the most popular models in machine learning, the decision tree. In our ifrst work on this topic, in 2007, we introduced DL8, an algorithm that showed how to calculate optimal decision trees from itemset patterns. Driven by an interest in imposing additional requirements on predictive models, such requirements of interpretability and fairness, in recent years the calculation of optimal decision trees has gained a lot of interest again. In this context, we proposed an extension of DL8, DL8.5, which obtained better performance than competitor systems. This abstract describes how research on inductive databases led to these recent systems and provides an overview of these systems.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Decision trees</kwd>
        <kwd>Search algorithms</kwd>
        <kwd>Machine learning</kwd>
        <kwd>Inductive databases</kwd>
        <kwd>Pattern mining</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        A core idea of inductive databases was to create systems similar to database systems that would
allow a user to query a database for patterns and models in a declarative manner: a user could
specify requirements on patterns or models in a declarative language, after which this system
would be responsible for finding the patterns or models satisfying the requirements. Initially, a
lot of this research focused on developing systems for patterns: indeed, in the first iterations of
the international workshop on Knowledge Discovery using Inductive Databases (KDID), the
majority of the papers focused on pattern mining [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ], and languages were studied that would
allow a user to impose constraints on patterns.
      </p>
      <p>
        However, as it was recognized that predictive models are very important in machine learning
and data mining, subsequent studies considered inductive databases that would support models
as well. One such model was the decision tree, as reported on in the KDID workshop of
2006 [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. The general idea was to create a system that would allow users to formulate and
answer queries such as:
      </p>
      <p>Given a database 
Find a classification tree</p>
      <p>
        Such that
• accuracy( ) is maximal
• depth( ) &lt;maxdepth
• ∀ ∈ leaves( ): support() ≥ minsup
Here maxdepth and minsup are parameters specified by the user, and users would have the
ability to add additional constraints, as required by the application context. These queries would
be expressed in a form of SQL [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] (following the ADReM approach using Mining Views) or
using a form of logic [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]; an underlying algorithm would find trees satisfying these conditions.
      </p>
      <p>
        The constraints proposed initially were relatively simple in nature, involving characteristics
such as accuracy, size and depth. However, soon it was realized that other constraints could
also be relevant, such as on the privacy preserving nature or the cost of the decision trees, and
these were added to the language of constraints that inductive databases could support [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>For many years, this work went unnoticed. However, in recent years the interest in imposing
additional requirements on predictive models has risen significantly: issues of explainability,
fairness, and privacy have grown dramatically in importance. This has led to a renewed interest
in the question of how to impose requirements on decision tree models. Here, the work on
inductive databases has recently been shown to be particularly relevant. In this abstract, we
present a short summary of how the work on inductive databases led to recent state-of-the-art
algorithms for learning optimal decision trees.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Algorithms in Inductive Databases</title>
      <p>
        While creating a language for formulating some form of decision tree queries is not very
dificult, an important challenge for inductive database systems was how to create algorithms
for answering these queries. Traditional algorithms for learning decision trees, such as CART,
are heuristic in nature and have no functionality to ensure that good trees are found under
constraints [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Indeed, one variant of the decision tree learning problem under constraints
was shown in 1976 to be NP complete and similar problems are assumed to be NP complete as
well [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        A first attempt to address this problem in the context of inductive databases was made by
Fromont et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] in 2006. This work proposed an exhaustive search algorithm that would
enumerate all possible decision trees up to a certain size, filtering out those that would not
meet the constraints specified by the user. When accuracy was used as optimisation criterion, a
bound based on accuracy was also used to prune the search space.
      </p>
      <p>The scalability of this approach was still limited; it only worked on all but the smallest
databases. A challenge remained how to answer inductive queries on more realistic databases.</p>
      <p>
        One idea that surfaced here was based on the observation that significant efort in the inductive
databases community had been spent on how to build eficient algorithms for pattern mining
and itemset mining in particular [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ]. We asked the question: given the high performance of
itemset mining algorithms, can databases with patterns be used to eficiently find decision trees
under constraints?
      </p>
      <p>
        In 2007, this led to our proposal for the DL8 algorithm (‘Decision trees from Lattices’) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The
core idea underlying this algorithm is that a decision tree consists of paths, paths are itemsets,
and hence decision trees can be constructed from itemsets. Indeed, our initial implementation
in fact operated on the output of a frequent itemset mining algorithm (Eclat).
      </p>
      <p>Compared to earlier algorithms, a key element of the DL8 algorithm is that it solves the
problem of finding a decision tree using dynamic programming. For the following problem:
[Depth-Constrained Accurate Trees]
Given a Boolean database  with examples labeled in two classes
Find a classification tree 
Such that
• error( ) is minimal
• depth( ) ≤ maxdepth
the following recursive equation is at the basis of this dynamic programming approach:
_() =
︂{
min ∈ℱ
leaf _error ()
∑︀∈( ) _( ∪ {}) if || &lt; ℎ;
if || = ℎ.</p>
      <p>Here the recursion starts at _(∅),  is an itemset, ℱ is the set features in the database,
( ) returns two items for each feature (one item for the value 0, one item for the value 1),
and  _() is the number of examples in the minority class of the examples covered
by the itemset . In words, the tree with minimal error is obtained by picking in the root the
feature which leads to the lowest sum of errors for the left-hand and right-hand child.</p>
      <p>
        The dynamic programming approach is based on storing the value _ for itemsets
such that they do not need to be recalculated later on when the same features are considered in
a diferent order. This trick has as efect that the algorithm does not need to enumerate all trees,
but only needs to enumerate itemsets. We showed in 2007 that this makes it possible to find
optimal decision trees for a much larger number of databases [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Here we used a customized
search algorithm that exploited many of the implementation tricks of itemset mining algorithms
at the time.
      </p>
      <p>While in our example above we use a depth constraint, in our original publication we did
not give this constraint much attention; we introduced an optimisation which allowed the
algorithm to find optimal decision specifically in cases where this constraint was absent, and
did our experiments for that setting. In these experiments we found that in some cases optimal
decision trees have better predictive performance than heuristically learned trees, in other cases
their performance is worse.</p>
      <p>
        We subsequently showed that this approach of dynamic programming over itemsets can also
be used in other settings: when adding a regularisation term in the optimisation criterion, when
taking into account costs and when taking into account a score for discrimination [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Optimal Decision Trees using MIP</title>
      <p>
        For a number of years there was little interest in the problem of finding optimal decision trees.
A publication by Bertsimas and Dunn [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] in 2017 seems to have been important in the revival
of the topic, and has attracted a wide interest in the topic. Their work showed the following:
• for the problem of finding Depth-Constrained Accurate Trees, optimal decision trees in
many cases perform better than heuristically learned trees;
• the problem of finding optimal trees can be formalized as a Mixed Integer Programming
problem, allowing the use of Mixed Integer Programming solvers to find such trees.
This led to a large interest in the topic, in particular among researchers with an expertise in
Mixed Integer Programming. A number of subsequent publications demonstrated improvements
in how to model decision trees in MIP [
        <xref ref-type="bibr" rid="ref12 ref13 ref14">12, 13, 14</xref>
        ] and applied the MIP approach to include
other forms of optimisation criteria, such as based on fairness [
        <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
        ]. In particular the fact
that additional constraints can be added by adding constraints in the MIP formulation, makes
this approach attractive, as it means no new algorithms need to be developed and expertise in
mathematical modeling is suficient to solve other forms of learning problems.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Reviving DL8</title>
      <p>
        Bertsimas and Dunn were clearly unaware of prior work on learning decision trees in the
inductive database context; hence, their publication also did not include a comparison with our
earlier work on DL8. In 2019 we decided to compare our original DL8 implementation with one
of the optimized versions of the MIP approach, BinOCT [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], on exactly the same optimisation
problem, that is, a problem with a depth constraint. The results are reproduced in Figure 1,
where it is shown how much time is needed to find an optimal tree for a number of instances
(where each instance corresponds to a dataset). The comparison showed that the performance
of the MIP-based approach was much worse than that of the DL8 algorithm. BinOCT could only
ifnd optimal trees within reasonable time for a depth of 2 and finding these trees still took much
more time than required by DL8. This led us to give DL8 a second look, in particular in the
context of depth-constrained decision trees. This led to the following additional contributions.
max depth = 2
max depth = 3
      </p>
      <p>max depth = 4
1e+03
()s1e+01
e
m
it
n
u
R
1e−01
●
1e+02
)(s1e+01 ● ● ● ● ● ● ● ●
e
itnm1e+00
uR ●
1e−01
1e−02
0
● ● ● ● ● ● ●
1e+02
)s1e+01
(
e
m
itn1e+00
u
R
1e−01 ●
1e−02
Algo ● BinOCT</p>
      <p>CP−based
5</p>
      <p>10 15
# instances solved</p>
      <p>First, we studied whether the idea of using bounds during a branch-and-bound search could
be added to DL8. This turned out to be possible, and was implemented in an algorithm that
we called DL8.5. Results for this algorithm are also shown in Figure 1: a significant further
speed-up compared to DL8 was obtained, making the gap with MIP-based approach even larger.</p>
      <p>
        A perceived strength of the MIP-based approach is that it allows to model other learning
problems simply by changing the MIP formulation, similar to how inductive databases would
allow to find many diferent types of models. To make DL8.5 also more generally usable, we
created a library in Python which allows the user also to implement their own scoring function
in Python [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]; this makes it possible to use DL8.5 easily also for regression problems, clustering
problems, and more, simply by writing a small amount of Python code. This brings DL8.5
practically closer to a system that allows writing queries.
      </p>
      <p>
        We studied whether the idea of finding optimal trees can also be extended towards forests of
decision trees, and showed that this is the case for LP-Boost based methods [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. The addition
of further constraints is here however still an unsolved problem.
      </p>
      <p>
        Compared to MIP, a possible weakness of DL8.5 is its memory use: it stores large amounts of
itemsets in order to calculate decision trees from them. In practice this means that the algorithm
can run out of memory on machines with small amounts of memory. We addressed this challenge
in an extension of DL8.5 which from time to time also removes itemsets from memory; while
this could mean these itemsets need to be recalculated later, with proper heuristics we can
assure that this does not happen too often. In practice this makes it possible to run the algorithm
in memory constrained environments for the price of not too much additional run time; indeed,
even under memory constraints DL8.5 remains much faster than MIP-based approaches [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
      </p>
      <p>
        Despite its better performance, DL8.5 still takes significant amounts of time for calculating
an optimal tree on some datasets. We asked the following question: suppose the user stops the
algorithm after a certain amount of time, can we still assure that the algorithm will return a
good tree at that moment? This led us to study other orders for traversing the search space [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
      </p>
      <p>
        The good performance of DL8.5 has led other people to continue with this algorithmic
approach. In particular, Demirovic et al. proposed MurTree [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], which improves the performance
of DL8.5 further by proposing a specialised algorithm for trees of depth 2 and by improving the
bounds used in the branch-and-bound search. In this work low-level optimisations are used that
have some resemblance to the optimisations that were also used in itemset mining algorithms
some time ago. This study, which used an implementation created from scratch, confirmed
also independently the superior performance of the DL8-type of algorithm over MIP-based
approaches for finding decision trees on Boolean data.
      </p>
      <p>
        An extension of DL8.5’s algorithm was also used to find decision trees for certain types of
non-linear scoring functions [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ].
      </p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>In this paper, we provided a short overview of optimal decision tree learning such as studied
initially in the context of the Knowledge Discovery in Inductive Databases workshop, and how
this has led to a type of algorithm, DL8, which has recently been studied in detail again and led
to numerous new publications.</p>
      <p>Given the fact that also in the first half of 2022 already many new publications have appeared
on algorithms for finding optimal decision trees, we believe that this work will continue to be
relevant. Also other older results in the domains of inductive databases and pattern mining may
here gain relevance once more.</p>
      <p>For instance, many of the current approaches are focused on depth-constrained decision trees.
However, in our past work we showed that optimisations based on condensed representations
of itemsets allow to find optimal decision trees also without such a constraint. The use of
condensed representations may hence also merit a new look in this context.</p>
      <p>Motivated by concerns over discrimination, privacy and explainability, the integration of
additional constraints continues to be relevant. We expect that DL8-style algorithms will
continue to be important in this context.</p>
      <p>
        DL8’s performance derives from dynamic programming, where it assumed that the left-hand
and right-hand side of a test in a tree can be optimised independently. For some scoring functions
this independence does not apply. Search algorithms such as developed by Fromont et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
could still be relevant here.
      </p>
      <p>At its core, DL8 still performs a form of itemset mining. Many ideas present in the itemset
mining literature, such as FP-Trees [23], have not been evaluated in the context of DL8 yet.</p>
      <p>A remaining challenge for DL8-style algorithms continues to be how to treat find trues or
numerical data. The integration of better discretization algorithms could be relevant here.
[23] J. Han, J. Pei, Y. Yin, Mining frequent patterns without candidate generation, ACM sigmod
record 29 (2000) 1–12.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Klemettinen</surname>
          </string-name>
          , R. Meo (Eds.),
          <source>Proceedings of the First International Workshop on Inductive Databases, 20 August</source>
          <year>2002</year>
          , Helsinki, Finland, Helsinki University Printing House, Helsinki,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Boulicaut</surname>
          </string-name>
          , S. Dzeroski (Eds.),
          <source>Proceedings of the Second International Workshop on Inductive Databases</source>
          ,
          <volume>22</volume>
          September, Cavtat-Dubrovnik, Croatia, Rudjer Boskovic Institute, Zagreb, Croatia,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>É.</given-names>
            <surname>Fromont</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Blockeel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Struyf</surname>
          </string-name>
          ,
          <article-title>Integrating decision tree learning into inductive databases</article-title>
          , in: S. Dzeroski, J. Struyf (Eds.),
          <source>Knowledge Discovery in Inductive Databases, 5th International Workshop</source>
          , KDID 2006, Berlin, Germany,
          <year>September 18</year>
          ,
          <year>2006</year>
          ,
          <string-name>
            <given-names>Revised</given-names>
            <surname>Selected</surname>
          </string-name>
          and Invited Papers, volume
          <volume>4747</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2006</year>
          , pp.
          <fpage>81</fpage>
          -
          <lpage>96</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Nijssen</surname>
          </string-name>
          , L. De Raedt,
          <article-title>IQL: A proposal for an inductive query language</article-title>
          , in: S. Dzeroski, J. Struyf (Eds.),
          <source>Knowledge Discovery in Inductive Databases, 5th International Workshop</source>
          , KDID 2006, Berlin, Germany,
          <year>September 18</year>
          ,
          <year>2006</year>
          ,
          <string-name>
            <given-names>Revised</given-names>
            <surname>Selected</surname>
          </string-name>
          and Invited Papers, volume
          <volume>4747</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2006</year>
          , pp.
          <fpage>189</fpage>
          -
          <lpage>207</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Nijssen</surname>
          </string-name>
          , É. Fromont,
          <article-title>Optimal constraint-based decision tree induction from itemset lattices</article-title>
          ,
          <source>Data Min. Knowl. Discov</source>
          .
          <volume>21</volume>
          (
          <year>2010</year>
          )
          <fpage>9</fpage>
          -
          <lpage>51</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Breiman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Friedman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Stone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Olshen</surname>
          </string-name>
          , Classification and
          <string-name>
            <given-names>Regression</given-names>
            <surname>Trees</surname>
          </string-name>
          , Chapman and Hall/CRC,
          <year>1984</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>L.</given-names>
            <surname>Hyafil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. L.</given-names>
            <surname>Rivest</surname>
          </string-name>
          ,
          <article-title>Constructing optimal binary decision trees is np-complete, Inf</article-title>
          . Process.
          <source>Lett. 5</source>
          (
          <year>1976</year>
          )
          <fpage>15</fpage>
          -
          <lpage>17</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>B.</given-names>
            <surname>Goethals</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Zaki</surname>
          </string-name>
          (Eds.),
          <source>FIMI '03</source>
          ,
          <string-name>
            <surname>Frequent Itemset</surname>
          </string-name>
          Mining Implementations,
          <source>Proceedings of the ICDM 2003 Workshop on Frequent Itemset Mining Implementations, 19 December</source>
          <year>2003</year>
          , Melbourne, Florida, USA, volume
          <volume>90</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2003</year>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>90</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>Bayardo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Goethals</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Zaki</surname>
          </string-name>
          (Eds.),
          <source>FIMI '04, Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations</source>
          , Brighton,
          <string-name>
            <surname>UK</surname>
          </string-name>
          , November 1,
          <year>2004</year>
          , volume
          <volume>126</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2005</year>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>126</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Nijssen</surname>
          </string-name>
          , É. Fromont,
          <article-title>Mining optimal decision trees from itemset lattices</article-title>
          , in: P. Berkhin,
          <string-name>
            <given-names>R.</given-names>
            <surname>Caruana</surname>
          </string-name>
          ,
          <string-name>
            <surname>X.</surname>
          </string-name>
          Wu (Eds.),
          <source>Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>
          , San Jose, California, USA,
          <year>August</year>
          12-
          <issue>15</issue>
          ,
          <year>2007</year>
          , ACM,
          <year>2007</year>
          , pp.
          <fpage>530</fpage>
          -
          <lpage>539</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D.</given-names>
            <surname>Bertsimas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dunn</surname>
          </string-name>
          ,
          <article-title>Optimal classification trees</article-title>
          ,
          <source>Mach. Learn</source>
          .
          <volume>106</volume>
          (
          <year>2017</year>
          )
          <fpage>1039</fpage>
          -
          <lpage>1082</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Verwer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <article-title>Learning optimal classification trees using a binary linear program formulation</article-title>
          ,
          <source>in: The Thirty-Third AAAI Conference on Artificial Intelligence</source>
          ,
          <source>AAAI</source>
          <year>2019</year>
          , AAAI Press,
          <year>2019</year>
          , pp.
          <fpage>1625</fpage>
          -
          <lpage>1632</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Aghaei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gómez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Vayanos</surname>
          </string-name>
          ,
          <article-title>Learning optimal classification trees: Strong maxlfow formulations</article-title>
          , CoRR abs/
          <year>2002</year>
          .09142 (
          <year>2020</year>
          ). URL: https://arxiv.org/abs/
          <year>2002</year>
          .09142. arXiv:
          <year>2002</year>
          .09142.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Boutilier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Michini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Shattering inequalities for learning optimal decision trees</article-title>
          , in: P. Schaus (Ed.),
          <source>Integration of Constraint Programming, Artificial Intelligence, and Operations Research - 19th International Conference, CPAIOR 2022</source>
          , Los Angeles, CA, USA, June 20-23,
          <year>2022</year>
          , Proceedings, volume
          <volume>13292</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2022</year>
          , pp.
          <fpage>74</fpage>
          -
          <lpage>90</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Aghaei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Azizi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Vayanos</surname>
          </string-name>
          ,
          <article-title>Learning optimal and fair decision trees for nondiscriminative decision-making</article-title>
          ,
          <source>in: The Thirty-Third AAAI Conference on Artificial Intelligence</source>
          ,
          <source>AAAI</source>
          <year>2019</year>
          , AAAI Press,
          <year>2019</year>
          , pp.
          <fpage>1418</fpage>
          -
          <lpage>1426</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>H.</given-names>
            <surname>Jeong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. P.</given-names>
            <surname>Calmon</surname>
          </string-name>
          ,
          <article-title>Fairness without imputation: A decision tree approach for fair prediction with missing values</article-title>
          ,
          <source>in: Thirty-Sixth AAAI Conference on Artificial Intelligence</source>
          ,
          <source>AAAI</source>
          <year>2022</year>
          , AAAI Press,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>G.</given-names>
            <surname>Aglin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nijssen</surname>
          </string-name>
          , P. Schaus,
          <year>Pydl8</year>
          .
          <article-title>5: a library for learning optimal decision trees</article-title>
          , in: C.
          <string-name>
            <surname>Bessiere</surname>
          </string-name>
          (Ed.),
          <source>Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI</source>
          <year>2020</year>
          ,
          <article-title>ijcai</article-title>
          .org,
          <year>2020</year>
          , pp.
          <fpage>5222</fpage>
          -
          <lpage>5224</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>G.</given-names>
            <surname>Aglin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nijssen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Schaus</surname>
          </string-name>
          ,
          <article-title>Assessing optimal forests of decision trees</article-title>
          ,
          <source>in: 33rd IEEE International Conference on Tools with Artificial Intelligence</source>
          , ICTAI 2021, Washington, DC, USA, November 1-
          <issue>3</issue>
          ,
          <year>2021</year>
          , IEEE,
          <year>2021</year>
          , pp.
          <fpage>32</fpage>
          -
          <lpage>39</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>G.</given-names>
            <surname>Aglin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nijssen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Schaus</surname>
          </string-name>
          ,
          <article-title>Learning optimal decision trees under memory constraints</article-title>
          ,
          <source>in: European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD)</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>H.</given-names>
            <surname>Kiossou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Schaus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nijssen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ratheil</surname>
          </string-name>
          ,
          <source>Time constrained dl8</source>
          .
          <article-title>5 using limited discrepancy search</article-title>
          ,
          <source>in: European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD)</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>E.</given-names>
            <surname>Demirovic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lukina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Hebrard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bailey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Leckie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ramamohanarao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Stuckey</surname>
          </string-name>
          , Murtree:
          <article-title>Optimal decision trees via dynamic programming and search</article-title>
          ,
          <source>J. Mach. Learn. Res</source>
          .
          <volume>23</volume>
          (
          <year>2022</year>
          )
          <volume>26</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>26</lpage>
          :
          <fpage>47</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>E.</given-names>
            <surname>Demirovic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Stuckey</surname>
          </string-name>
          ,
          <article-title>Optimal decision trees for nonlinear metrics</article-title>
          ,
          <source>in: Thirty-Fifth AAAI Conference on Artificial Intelligence</source>
          ,
          <source>AAAI</source>
          <year>2021</year>
          , AAAI Press,
          <year>2021</year>
          , pp.
          <fpage>3733</fpage>
          -
          <lpage>3741</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>