=Paper= {{Paper |id=Vol-2745/paper6 |storemode=property |title=Improving the Efficiency of Euclidean TSP Solving in Constraint Programming by Predicting Effective Nocrossing Constraints |pdfUrl=https://ceur-ws.org/Vol-2745/paper6.pdf |volume=Vol-2745 |authors=Elena Bellodi,Alessandro Bertagnon,Marco Gavanelli,Riccardo Zese |dblpUrl=https://dblp.org/rec/conf/aiia/BellodiBGZ20 }} ==Improving the Efficiency of Euclidean TSP Solving in Constraint Programming by Predicting Effective Nocrossing Constraints== https://ceur-ws.org/Vol-2745/paper6.pdf
      Improving the efficiency of Euclidean TSP
        solving in Constraint Programming by
      predicting effective nocrossing constraints?

Elena Bellodi1[0000−0002−3717−3779] , Alessandro Bertagnon1[0000−0003−2390−0629] ,
Marco Gavanelli1[0000−0001−7433−5899] , and Riccardo Zese1[0000−0001−8352−6304]

                               DE, Ferrara University
                                 Via G. Saragat 1
                                44122 Ferrara, Italy
                      {elena.bellodi,alessandro.bertagnon,
                     marco.gavanelli,riccardo.zese}@unife.it



        Abstract. The Traveling Salesperson Problem (TSP) is a well-known
        problem addressed in the literature through various techniques, includ-
        ing Integer Linear Programming, Constraint Programming (CP) and Lo-
        cal Search. Many real life instances belong to the subclass of Euclidean
        TSPs, in which the nodes to be visited are associated with points in the
        Euclidean plane, and the distance between them is the Euclidean dis-
        tance. A well-known property of the Euclidean TSP is that no crossings
        can exist in an optimal solution. In a previous publication, we exploited
        this property to speedup the solution of Euclidean instances in CP, by
        imposing a number of so-called no-overlapping constraints. The number
        of imposed constraints is quadratic in the number of nodes of the TSP.
        In this work, we observe that not all the no-overlapping constraints are
        equally useful: by experimental analysis, some of them provide a speedup,
        while others only introduce overhead.
        We use a supervised machine learning approach on them to learn a bi-
        nary classifier, with the objective to impose only those no-overlapping
        constraints that have been classified as effective. Preliminary experiments
        support the validity of the idea.

        Keywords: Constraint Programming · Supervised Machine Learning ·
        Euclidean TSP.


1     Introduction
The TSP is one of the best-known problems in computer science; given a graph
in which the edges have non-negative weights (usually interpreted as traveling
costs), the objective is to find a circuit visiting each node exactly once and with
minimal total cost.
?
    Copyright ©2020 for this paper by its authors. Use permitted under Creative Com-
    mons License Attribution 4.0 International (CC BY 4.0).
    This work is partially supported by GNCS-INdAM.
     The TSP is notoriously NP-hard, and it has been approached through various
techniques. The most successful, to date, is Integer Linear Programming and its
variants; the current state of the art is the TSP solver Concorde [2], that employs
various techniques including local search, and branch-and-cut.
     Concorde, however, cannot address problems in which additional constraints
exist, such as the TSP with time windows (in which each of the nodes can be
visited only within a given time window), or the Vehicle Routing Problem (in
which more than one vehicle exists).
     The TSP was also addressed in the literature of Constraint Programming,
that offers more flexibility and lets one add the so-called side constraints [10, 25,
5, 14, 15, 12].
     One interesting case of TSP is the Euclidean TSP, in which the nodes to
be visited are associated with points in the Euclidean plane, and the distance
between each pair of points is computed through the Euclidean distance. The
Euclidean TSP is NP-Hard [20], but it admits Polynomial Time Approximation
Schemes [3, 29]. Despite the impressive theoretical interest of these celebrated
results, the usual way to address the Euclidean TSP is to convert it into a
TSP by computing the distance matrix between each pair of nodes, and then
use a TSP solver (e.g., Concorde) to find the optimal solution. This method
completely disregards the additional information intrinsic in the Euclidean TSP
formulation, such as the coordinates in the plane of the nodes.
     In Constraint Programming, two methods exploit the information about the
problem coordinates to speed up the solution process of the TSP. Deudon et
al. [12] train a Deep Neural Network with the point coordinates in order to learn
efficient heuristics to explore the search space.
     In the other, instead, we proposed the first approach in which the infor-
mation about the point coordinates was used to prune the search space of the
constraint programming formulation [6]. The work started from the well-known
observation that in an optimal Euclidean TSP two edges cannot cross each other,
otherwise there exists another circuit with shorter length. We proposed a con-
straint nocrossing that imposes that the edges exiting from two given nodes
do not cross each other. This constraint is imposed for each of the n(n−1)2    pairs
of nodes. Experimental results show the effectiveness of the approach.
     As all constraints, each of them must be present in memory, can be awaken
(activated) if suitable conditions occur (typically, the removal of an element
from one domain), possibly performs some pruning and then becomes dormant
again. When the solver wakes up a constraint, some computation time is spent
in awaking, scheduling the constraint and performing the checks required by
its logic. Nevertheless, the experimental results in [6] show that the additional
pruning widely compensates the introduced overhead, globally.
     However, although globally the set of constraints is worth imposing, it still
might be the case that some of the n(n−1)  2   constraints never perform any prun-
ing, and only introduce overhead. Stated otherwise, one wonders whether all
these n(n−1)
           2    constraints equally contribute to the effectiveness of the method,
or whether some of these constraints perform strong pruning, while others per-
form little or no pruning. If one were able to guess a priori which constraints will
perform pruning and which, instead, will only provide overhead, she/he could
avoid imposing the useless ones, reducing the overhead associated with the set
of nocrossing constraints, while retaining all (or, almost all) their pruning.
    In this paper, we experimentally study how much pruning is performed by
each of the n(n−1)
                 2   constraints. We label each constraint as useful or useless
considering the number of times the constraint is woken and the amount of
pruning it performs. We then learn a random forest binary classifier to predict
which of the constraints in a new instance will be useful and which will be useless.
    Preliminary experimental results show that the approach is promising.
    The rest of the paper is organized as follows: after some preliminaries, we
recap from [6] the basic idea of removing crossings and the declarative semantics
of the nocrossing constraint (Section 3). Section 4 explains the data we collected
running the Euclidean TSP solver on various instances. Section 5 is devoted to
the Machine Learning approach to learn the classifier. In Section 6 we show
the experimental results both of the classifier and of the resulting Euclidean
TSP solver exploiting its predictions. Finally, Section 8 concludes the paper and
provides insights into future work.

2    Preliminaries
Let G = (V, E, w) be a weighted graph, where V is a set of nodes, E is a set of
edges, and w : E 7→ R+ . A path is a sequence pvs0 -vsk = vs0 es0 ,s1 vs1 . . . esk−1 ,sk vsk
such that
 1. vs0 , vs1 , . . . , vsk ∈ V and are all distinct, and
 2. es0 ,s1 , . . . , esk−1 ,sk ∈ E.
Since a path is uniquely identified by the sequence of its nodes (or of its edges) in
the proper order, to simplify the notation we will often write paths as sequences
of nodes. The length of a path p is the sum of the weights of its edges: L(p) =
Pk−1
   i=0 w(esi ,si+1 ). Given a path pvs0 -vsk , the sequence obtained by appending
esk ,s0 to a path pvs0 -vsk is also called a circuit c.
     The Metric TSP is a TSP in which there is an edge connecting each pair of
nodes, and the distance function w enjoys the triangular inequality: w(ea,c ) ≤
w(ea,b ) + w(eb,c ).
     A Euclidean TSP is a Metric TSP in which the distance function is the
Euclidean distance. Let P = {P1 , . . . , Pn } be a set of points, where Pi = (xi , yi ).
The graph associated with P is GP = (P, E P , wP ), where E P = {ei,j ≡ (Pi , Pj ) |
Pi , Pj ∈ P, i 6= j} and w(Pi , Pj ) = d(Pi , Pj ), where d is the Euclidean distance.

    In the Constraint Programming literature, three main constraint models have
been proposed to address the TSP: the permutation representation, the successor
representation and the set variable representation [5].
    In this paper we adopt the successor representation; it is defined with a set
of n variables Next, each ranging on the set of available nodes. Next i = j means
that the successor of node i is node j. The constraint model includes
 – an alldifferent(N ext) constraint [32], that imposes that each node is the
   successor of exactly one other node,
 – a circuit(N ext) constraint [10] that excludes the creation of sub-circuits,
   i.e., circuits that do not involve the whole set of nodes,
 – and an objective function aimed at minimizing the total length of the TSP.


3    Avoiding crossings
The following is a well-known result in the literature
Theorem 1. [16]. Let c∗ be an optimal tour of a Metric TSP. Then, for each
ei,j , ek,l ∈ c∗ such that {i, j, k, l} are all different, the segments Pi Pj ∩ Pk Pl = ∅.


                                                         Pk
                                Pi                         b
                                  b


                    pi-l                                             pj - k

                                              Q

                                                                b
                            b                                  Pj
                           Pl

                                 Fig. 1. A self-crossing circuit.



Proof. (sketch). In Fig. 1, instead of taking Pi Pj and Pk Pl , a shorter tour chooses
the dotted edges Pi Pk and Pl Pj .
   In order to speedup the search, in [6], we proposed a constraint that avoids,
during search, the solutions that include crossings. The nocrossing constraint

                                nocrossing(i, Next i , j, Next j )

imposes that the segment Pi PNext i and the segment Pj PNext j do not intersect,
or intersect at most in one of the extremes Pi or Pj .
    Clearly, this constraint should be imposed for each pair of nodes, i.e. a
quadratic number of constraints.
    Notice that these constraints are aimed only at improving the efficiency of
the solution, they are not necessary for its correctness; they are in fact redundant
constraints, and it is well known in the CP literature that redundant constraints
can improve the efficiency of the search.
    One question could be whether all these constraints perform effective pruning,
reducing the search space, or whether only some of them are actually useful,
while others do not perform any significant pruning while introducing overhead.
In next section, we try to reply to this question.
4   The collected data

To evaluate the performance of each constraint we collected data while solv-
ing Euclidean TSP instances. In order to have a statistically significant number
of instances, we decided to use randomly-generated ones. We used the genera-
tor of the DIMACS challenge [24], which provides two-dimensional instances in
TSPLIB format consisting of integer-coordinate points. The generated instances
can belong to two different classes: uniform and clustered. Points of the instances
belonging to the uniform class are uniformly distributed in a 106 × 106 square,
while points belonging to the clustered class are located in clusters that are uni-
formly distributed in a 106 × 106 square. We randomly generated instances from
15 to 30 nodes, in both classes. For each size and class we generated 32 instances,
for a total of 1024 instances.
    For each nocrossing constraint, in each instance, we measured 3 indicators:
the number of activations Nactivations , the number of value deletions Npruned
from the domain of the variables involved, and the number of failures (and
therefore backtracks) generated as a result of the deletion of values. The first
two indicators were then combined to obtain a fourth one, denoted as RTIO and
                           Npruned
calculated as the ratio Nactivations . A constraint with a low RTIO wakes up many
times without being able to perform pruning so it produces an unwanted over-
head, while a constraint with a high RTIO can perform a much stronger pruning
compared to the number of activations and therefore it is worth imposing it.
Figures 2, 3 and 4 graphically show the number of value deletions, the number
of failures and the RTIO respectively, for a typical instance of Euclidean TSP. In
each figure, the darker the color of the line, the higher the value of the corre-
sponding indicator. Figure 5 furthermore illustrates the nocrossing constraints
that have not performed any pruning (Npruned = 0). We produced many of these
figures in the hope to find some significant pattern, that could help us identify
the useful or the useless instances of nocrossing constraints; unluckily we were
not able to observe any interesting pattern. We decided then to introduce a
machine learning step.
    Each constraint was labelled by means of the RTIO, which can be seen as an
indicator of the “goodness” of a constraint. A constraint belonging to a certain
instance I is labeled as useful if its RTIO is greater than the arithmetic mean
calculated on the RTIO of all the constraints belonging to instance I, otherwise
it is labeled as useless. The relation we wish to learn could be seen as a func-
tion mapping each pair of points (in a generic Euclidean TSP instance) to the
set {useful , useless}. In principle, each point could be identified solely by its
coordinates, but this could be a too specific information: the effectiveness of a
constraint should be independent from rigid transformations of the whole set of
points, such as rotations, axial symmetries, or even scaling.
    For this reason, beside the (normalized) coordinates of each point, we com-
puted a set of features trying to synthesize some further information that is
invariant with respect to these transformations. As a guidance, we chose some
features reflecting information exploited by effective TSP solving algorithms,
Fig. 2. Graphical representation of the number of value deletions performed by each
nocrossing constraint in a Euclidean TSP instance. The darker the color of the seg-
ment, the higher the number of value deletions.




Fig. 3. Graphical representation of the number of failures generated by each
nocrossing constraint in a Euclidean TSP instance. The darker the color of the seg-
ment, the higher the number of failures.




Fig. 4. Graphical representation of the RTIO of each nocrossing constraint in a Eu-
clidean TSP instance. The darker the color of the line, the higher the RTIO value.
Fig. 5. Graphical representation of the nocrossing constraints that did not delete
any value in a Euclidean TSP instance. A line between two points indicates that the
nocrossing constraint applied on that pair of points was unable to perform any prun-
ing. These constraints only introduce overhead.


in the hope that they could also serve as guidance for the effectiveness of the
nocrossing constraints.
    One effective algorithm for solving TSPs is the Held and Karp procedure [22],
based on spanning trees. The minimum spanning tree of the set of nodes can be
computed in polynomial time, and is a popular valid lower bound on the value
of the optimal solution. One of the properties we chose for a pair of points is
whether the segment connecting them belongs to a minimum spanning tree.
    Another interesting property is the so-called necklace condition [13]. Suppose
to find a set of discs, each centered on one of the points to be visited, such that
the interiors of two discs do not intersect. Clearly, an optimal tour should enter
and exit each of the discs, so a valid lower bound is twice the sum of the radii
of the discs. From this observation, another interesting property could be the
distance of each node to the closest other node.
    Finally, in [6] we also introduced constraints that performed pruning based
on the convex Hull of the set of points; that pruning was also extended to the
case of interior Hulls, after (during search) some of the segments in the current
path was already fixed.
    Considering what has been introduced so far, we have identified the following
15 features for each nocrossing(i, Next i , j, Next j ) constraint in the dataset:

 – XPIN, YPIN, XPJN, YPJN: normalized coordinates of the two points Pi and Pj
   involved in the constraint. These coordinates are obtained, starting from the
   original coordinates of the points, by repositioning each instance within a
   square space with a 100 unit long side;
 – DISN: Euclidean distance calculated between points Pi and Pj using normal-
   ized coordinates;
 – LEVI, LEVJ: level of points Pi and Pj . The idea is to distinguish the points on
   the perimeter of the convex hull from the internal ones, and have a numeric
     value suggesting how deep in the interior of the figure is each point. The
     level of a point P is defined inductively with respect to the set P of all the
     points:
                                   lev(P ) = levP (P ).
     The level of a point P with respect to a set X is 1 if P belongs to the
     “exterior” of X (precisely, the perimeter HullX of the convex hull of X ) and
     is defined inductively as 1 plus the level of P on the “interior” set X \ HullX
     otherwise:
                                
                                            1           if P ∈ HullX
                    levX (P ) =
                                   1 + levX \HullX (P ) otherwise

 – CXNI, CYNI, CXNJ, CYNJ: normalized coordinates of the nearest point to Pi
   and Pj respectively;
 – NDTI, NDTJ: Euclidean distance of the closest point to Pi and Pj respectively,
   calculated using normalized coordinates;
 – NBHD: number of points contained in the circle having as diameter the seg-
   ment connecting points Pi and Pj ;
 – NMST: (boolean value) indicates whether the segment Pi Pj belongs to a min-
   imum spanning tree.


5     Machine learning the goodness of constraint
      propagators

The dataset of labelled constraints is suitable for the application of supervised
machine learning algorithms, with the goal of learning a model able to predict
which of the constraints will be useful and which will be useless in a new un-
seen instance of the Euclidean TSP. As each constraint is labelled as useful or
useless, a binary classifier can be learnt that discriminates between the negative
class (useless, represented by a label equal to 0) and the positive class (useful,
represented by a label equal to 1).
    Among the classification techniques, the Random Forest (RF) approach is
known for its good computational performance and scalability. A random forest
is a meta estimator that fits a number of decision tree classifiers on various sub-
samples of the dataset and uses averaging to improve the predictive accuracy and
control overfitting [9]. The algorithm used in the experimental validation is the
one available in the WEKA1 workbench for machine learning. Given a training
set X of instances of Euclidean TSP, learning a random forest in WEKA involves
the following steps [1]:

 1. Bootstrap samples Bi for every tree ti are drawn by randomly selecting pairs
    of points (the examples) with replacement from X until the sizes of Bi and
    X are equal;
1
    https://www.cs.waikato.ac.nz/ml/weka/
 2. A random subset of features (attributes) are selected for each Bi and used
    for the training of tree ti in the forest;
 3. An information gain metric is used to grow unpruned decision trees;
 4. The final classification result is the most popular of the individual tree pre-
    dictions.

   The RF classifier can now be used to predict if on new unseen pairs of points
the nocrossing constraint should be imposed, at the same time indicating to
avoid imposing constraints that have been assigned the negative class (useless).
The whole proposed procedure is the following:

 1. A training dataset of various Euclidean TSP instances, where each constraint
    has been labelled according to its own RTIO, is used to learn a RF classifier;
 2. Given any new instance of the problem, for whose constraints the three
    indicators (and so the class) are unknown, apply the classifier to find pairs
    of points that are classified as useful for imposing the nocrossing constraint;
 3. Run the instance together with the selected nocrossing constraints to solve
    the Euclidean TSP.

With step 3, we try to eliminate the temporal overhead that might be introduced
during the search for a solution by the constraints recognized as not effective by
the random forest model.
    An advantage of this procedure is that, once the classifier has been learnt,
it can be reused on any new instance of the problem, without re-performing the
machine learning step, which is executed only once.


6     Experiments

6.1   Results of the Machine Learning step

The machine learning task was carried out by training the RF classifier imple-
mented in WEKA (version 3.8.4) [34]. The training phase was controlled by the
following parameters:

 – -P 100: size of each bag, as a percentage of the training set size; the default
   value of 100 was kept;
 – -I 100: number of iterations (i.e., the number of trees in the random forest);
 – -num-slots 1: number of execution slots (thread) for constructing the en-
   semble. The default 1 means no parallelism;
 – -K 0: sets the number of randomly chosen attributes;
 – -M 1: the minimum number of instances per leaf;
 – -V 0.001: minimum numeric class variance proportion of train variance for
   split (it was kept the default value);
 – -S 1: seed for random number generator (it was kept the default value).

   Given a dataset of 517,120 nocrossing constraints, collected from 1024 in-
stances of Euclidean TSP in order to investigate the performance of the classifier,
data were split into 66% for training and the remainder for test, obtaining the
performance summary shown in Table 1 and an accuracy of 0.908. Recalling
that each pair of points can be referred to as an ‘example’, the TP Rate is the
fraction of true positives (TP) over the total positive examples, the FP Rate is
the fraction of false positives (FP) over the total negative examples, Precision is
T P/(T P + F P ), Recall corresponds to the TP Rate, F-measure is the harmonic
mean of Precision and Recall, ROC and PR Area represent the areas under the
ROC and PR curves taking values in the range [0,1]. The ROC curve plots the
TP Rate versus the FP Rate, the PR curve plots the Precision versus the Recall
[11, 31]. Accuracy is the number of correctly classified examples over all exam-
ples. The highest possible value for all metrics is 1, and, except for the FP Rate,
the highest the value, the better the performance.
    The classifier shows a very good performance over the test nocrossing con-
straints especially for the negative class (useless). After this preliminary test,
the entire dataset was used for training the RF classifier, in order to feed the
learning algorithm with more data, taking 544.41s to complete.


Table 1. Performance of the RF classifier over the fraction of instances used for test.

      Class TP Rate (Recall) FP Rate Precision F-Measure ROC Area PR Area
        0       0.957         0.301   0.932      0.944     0.945   0.984
        1       0.699         0.043   0.790      0.742     0.945   0.818




6.2   Results of the overall Euclidean TSP solver

In order to evaluate the improvements in solving time obtained thanks to the
predictions made by in the machine learning step, we devised a series of ex-
periments based on randomly-generated TSPs. As for the data collection phase
described in Section 4 we used the generator of the DIMACS challenge [24]. We
generated a total of 1024 test instances (different from the ones used for learn-
ing) varying the size from 15 to 30 nodes, in both uniform and clustered classes
(32 instances for each size and class).
    We compared three constraint models based on the successor representation
as introduced in Section 2:

 – the basic constraint model described in the preliminaries (denoted as ECLP),
   including the circuit and alldifferent constraints required by the suc-
   cessor representation plus the objective function;
 – the constraint model imposing the nocrossing constraint for all pairs of
   nodes (denoted as ALL);
 – the model imposing only the nocrossing constraints predicted as useful by
   the RF classifier (denoted as PRED).
                              ECLP
                  3,000        ALL
                              PRED

                  2,500


                  2,000
       Time [s]




                  1,500


                  1,000


                   500


                      700   720   740   760    780 800 820 840     860   880   900
                                              # solved instances

Fig. 6. Solving time as a function of the number of solved instances. Different curves
correspond to the different models that were compared during the experiments.


    All experiments use the max regret search strategy [10].
    All algorithms are implemented in the ECLi PSe CLP language [33]. All tests
were run on ECLi PSe v. 7.0, build #54, on Intel® Xeon® E5-2630 v3 CPUs
running at 2.4GHz, with one core and with 1GB of reserved memory. The time
limit for each run was set to 3480s.
    Figure 6 summarizes the results, in particular it shows how many of the
instances can be solved by each of the three constraint models when given a
time limit reported in the y-axis. The constraint model that includes only the
nocrossing constraints predicted to be useful is the one obtaining the best
results overall, confirming the effectiveness of the approach proposed in this
paper. The selection of constraints also increased slightly the number of instances
that could be solved: among the 1024 tested instances, ECLP incurred in timeout
on 158 instances, while ALL on 145, and PRED on 141. Randomly-generated TSPs
instances, collected runtime data and datasets are available online2 .


7     Related work
There exists a wide literature on combinations of constraint programming with
machine learning or data mining [8].
   One of the main ideas is portfolio selection (see, e.g., the survey [26] and ref-
erences therein): given a set of algorithms (or solvers) that solve a same problem,
2
    https://github.com/abertagnon/rcra-2020
select the best one for solving a given instance. The approach is based on obtain-
ing data about the running time of the algorithms by collecting a high number
of instances and running each of the available algorithm on each instance. Also,
for each instance a number of features are computed, hopefully synthesizing the
most important characteristics of the instance that make it easy or hard to solve.
These can be simple statistical measures, including the size of the instance, up
to measures from the literature, such as the treewidth of the instance. After
that, a classifier is learned trying to predict, given the set of features of a new
unseen instance, which of the available solvers will be the fastest for that specific
instance. Once a new instance is provided, the classifier is used to choose the
best solver for solving it.
    Our work could, in principle, be seen as an algorithm portfolio in which
the classifier chooses among an exponential number of solving algorithms, each
imposing a subset of the set of possible nocrossing constraints. However, we do
not learn such a complex classifier (although in principle it could provide better
results) because the number of possible solvers would be too large, but we try
to predict which of the single nocrossing constraints will be effective.
    Although less strictly related to this paper, we cite other approaches to com-
bine Machine Learning and CP, including Empirical Model Learning [28, 27],
trying to learn some features of a physical system and including its input/output
relation as a new constraint, or approaches that try to learn single constraints
or a whole constraint model given examples from the user [7, 4].
    Various works in the CP literature address the TSP or some of its variants.
Some of them propose implementations of the circuit constraint [10, 25, 19,
14]: considering that the Hamiltonian Circuit Problem is NP-complete, obtain-
ing Arc-Consistency for the circuit constraint would be NP-complete, so the
works in the literature forego the idea of achieving Arc-Consistency, and propose
instead different tradeoffs between computation time and pruning power.
    Considering the objective function, various works exploit relaxations (as usu-
ally done in Integer Linear Programming) to quickly rule out non-promising so-
lutions; the usual relaxations are the minimum spanning tree [30, 17, 18] and the
assignment problem relaxations [17, 18].
    Starting with Benchimol et al. [5], various works propose to integrate the ob-
jective function into a constraint that ensures Hamiltonianicity of the graph [15,
23, 12] in order to achieve stronger pruning based on the current upper bound.


8   Conclusions and Future work

In this paper, we proposed to predict and select the set of nocrossing constraints
in Euclidean TSPs formulated in CP.
    The first experimental results are encouraging, but more extensive experi-
ments will also be carried out. The performance of the solver with constraints
pruned by the RF classifier will be compared with the same number of constraints
but pruned at random, to validate the relevance of the proposed approach. We
also believe that much more efficiency could be obtained in future research.
    One first direction is refining the RF model to check if better performance
may be obtained, but also testing other supervised learning techniques. For in-
stance it would be interesting to predict, instead of two classes, the actual ratio
                                        Npruned
of pruning power versus activations Nactivations , run experiments imposing only
those nocrossing constraints whose ratio is predicted to be higher than a given
threshold, and experimentally find the best possible value of the threshold. This
changes the machine learning step from a classification to a regression task,
which, on the other hand, may be more challenging. One could then move from
supervised learning to reinforcement learning techniques and thus avoid collect-
ing a training set.
    Some improvement could be obtained by expanding the dataset of experi-
ments, i.e. by running experiments in more instances to widen the available data
about number of activations and pruning of the nocrossing constraints. Also,
in the current dataset only one search heuristic was employed, namely max re-
gret [10]. Since max regret is a dynamic search heuristics, it might be the case
that changing the set of nocrossing constraint, the search strategies change
radically the exploration of the search tree, possibly shuffling the order in which
nocrossing constraints are activated, and making effective some constraints
that were not and vice-versa. So, a more precise classifier could be obtained by
generating datasets with different search strategies.
    Another method could be to use data augmentation techniques; for example
by rotating all points in an instance of a same angle, one obtains the same TSP
instance (and the optimal solution does not change), so in principle the number
of activations should not change, while the input data would be different from
the viewpoint of the machine learning algorithm, since the point coordinates
would change.
    An enlargement of the set of features, collected during the creation of the
dataset, is already planned, so that the learned classifier relies less on the point
coordinates and more on the synthesized features (e.g. number of edges crossed
by the segment Pi Pj related to the constraint nocrossing(i, Next i , j, Next j )).
    Finally, instead of learning a classifier that chooses the set of nocrossing
constraints that should be imposed a priori, one could use more dynamic strate-
gies, like removing during search the nocrossing constraints that result less
effective because they have not obtained significant pruning in the last activa-
tions. One source of inspiration could be the strategies used in SAT solvers to
forget some of the nogoods [21].


References
 1. Amrehn, M., Mualla, F., Angelopoulou, E., Steidl, S., Maier, A.: The random forest
    classifier in weka: Discussion and new developments for imbalanced data (2019)
 2. Applegate, D., Bixby, R.E., Chvátal, V., Cook, W.J.: TSP cuts which do not con-
    form to the template paradigm. In: Jünger, M., Naddef, D. (eds.) Computational
    Combinatorial Optimization, Optimal or Provably Near-Optimal Solutions [based
    on a Spring School, Schloß Dagstuhl, Germany, 15-19 May 2000]. Lecture Notes in
    Computer Science, vol. 2241, pp. 261–304. Springer (2001)
 3. Arora, S.: Polynomial time approximation schemes for euclidean TSP and other ge-
    ometric problems. In: Proceedings of 37th Conference on Foundations of Computer
    Science. pp. 2–11 (Oct 1996)
 4. Beldiceanu, N., Simonis, H.: Modelseeker: Extracting global constraint models from
    positive examples. In: Bessiere et al. [8], pp. 77–95. https://doi.org/10.1007/978-
    3-319-50137-6
 5. Benchimol, P., van Hoeve, W.J., Régin, J., Rousseau, L., Rueher, M.: Improved
    filtering for weighted circuit constraints. Constraints 17(3), 205–233 (2012)
 6. Bertagnon, A., Gavanelli, M.: Improved filtering for the euclidean traveling
    salesperson problem in CLP(FD). In: The Thirty-Fourth AAAI Conference
    on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applica-
    tions of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Sym-
    posium on Educational Advances in Artificial Intelligence, EAAI 2020, New
    York, NY, USA, February 7-12, 2020. pp. 1412–1419. AAAI Press (2020),
    https://aaai.org/ojs/index.php/AAAI/article/view/5498
 7. Bessiere, C., Daoudi, A., Hebrard, E., Katsirelos, G., Lazaar, N., Mechqrane, Y.,
    Narodytska, N., Quimper, C., Walsh, T.: New approaches to constraint acquisition.
    In: Bessiere et al. [8], pp. 51–76. https://doi.org/10.1007/978-3-319-50137-6
 8. Bessiere, C., De Raedt, L., Kotthoff, L., Nijssen, S., O’Sullivan, B., Pedreschi,
    D. (eds.): Data Mining and Constraint Programming - Foundations of a Cross-
    Disciplinary Approach, Lecture Notes in Computer Science, vol. 10101. Springer
    (2016). https://doi.org/10.1007/978-3-319-50137-6
 9. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (Oct 2001).
    https://doi.org/10.1023/A:1010933404324
10. Caseau, Y., Laburthe, F.: Solving small TSPs with constraints. In: Naish, L. (ed.)
    Logic Programming, Proceedings of the Fourteenth International Conference on
    Logic Programming, Leuven, Belgium, July 8-11, 1997. pp. 316–330. MIT Press
    (1997)
11. Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves.
    In: European Conference on Machine Learning (ECML 2006). pp. 233–240. ACM
    (2006)
12. Deudon, M., Cournut, P., Lacoste, A., Adulyasak, Y., Rousseau, L.: Learning
    heuristics for the TSP by policy gradient. In: van Hoeve, W.J. (ed.) Integration of
    Constraint Programming, Artificial Intelligence, and Operations Research - 15th
    International Conference, CPAIOR 2018, Delft, The Netherlands, June 26-29, 2018,
    Proceedings. Lecture Notes in Computer Science, vol. 10848, pp. 170–181. Springer
    (2018)
13. Edelsbrunner, H., Rote, G., Welzl, E.: Testing the necklace condition for shortest
    tours and optimal factors in the plane. Theor. Comput. Sci. 66(2), 157–180 (1989).
    https://doi.org/10.1016/0304-3975(89)90133-3
14. Fages, J., Lorca, X.: Improving the asymmetric TSP by considering graph struc-
    ture. CoRR abs/1206.3437 (2012), http://arxiv.org/abs/1206.3437
15. Fages, J., Lorca, X., Rousseau, L.: The salesman and the tree: the importance of
    search in CP. Constraints 21(2), 145–162 (2016)
16. Flood, M.M.: The traveling-salesman problem. Operations Research 4 (1956)
17. Focacci, F., Lodi, A., Milano, M.: Embedding relaxations in global constraints for
    solving TSP and TSPTW. Ann. Math. Artif. Intell. 34(4), 291–311 (2002)
18. Focacci, F., Lodi, A., Milano, M.: A hybrid exact algorithm for the TSPTW.
    INFORMS Journal on Computing 14(4), 403–417 (2002)
19. Francis, K.G., Stuckey, P.J.: Explaining circuit propagation. Constraints 19(1),
    1–29 (2014). https://doi.org/10.1007/s10601-013-9148-0
20. Garey, M.R., Graham, R.L., Johnson, D.S.: Some NP-complete geometric prob-
    lems. In: Proceedings of the Eighth Annual ACM Symposium on Theory of Com-
    puting. pp. 10–22. STOC ’76, ACM, New York, NY, USA (1976)
21. Gent, I.P., Miguel, I., Moore, N.C.A.: An empirical study of learning and forgetting
    constraints. AI Commun. 25(2), 191–208 (2012). https://doi.org/10.3233/AIC-
    2012-0524
22. Held, M., Karp, R.M.: The traveling-salesman problem and minimum spanning
    trees. Operations Research 18, 1138–1162 (1970)
23. Isoart, N., Régin, J.C.: Integration of structural constraints into tsp models. In:
    International Conference on Principles and Practice of Constraint Programming.
    pp. 284–299. Springer (2019)
24. Johnson, D.S., McGeoch, L.A.: Experimental analysis of heuristics for the stsp. In:
    The traveling salesman problem and its variations, pp. 369–443. Springer (2007)
25. Kaya, L.G., Hooker, J.N.: A filter for the circuit constraint. In: Benhamou, F. (ed.)
    Principles and Practice of Constraint Programming - CP 2006, 12th International
    Conference, CP 2006, Nantes, France, September 25-29, 2006, Proceedings. Lecture
    Notes in Computer Science, vol. 4204, pp. 706–710. Springer (2006)
26. Kotthoff, L.: Algorithm selection for combinatorial search problems: A survey. In:
    Bessiere et al. [8], pp. 149–190. https://doi.org/10.1007/978-3-319-50137-6
27. Lombardi, M., Milano, M.: Boosting combinatorial problem modeling with machine
    learning. In: Lang, J. (ed.) Proceedings of the Twenty-Seventh International Joint
    Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm,
    Sweden. pp. 5472–5478. ijcai.org (2018). https://doi.org/10.24963/ijcai.2018/772
28. Lombardi, M., Milano, M., Bartolini, A.: Empirical decision model learning. Artif.
    Intell. 244, 343–367 (2017). https://doi.org/10.1016/j.artint.2016.01.005
29. Mitchell, J.S.B.: Guillotine subdivisions approximate polygonal subdivisions: A
    simple polynomial-time approximation scheme for geometric TSP, k-MST, and
    related problems. SIAM J. Comput. 28(4), 1298–1309 (1999)
30. Pesant, G., Gendreau, M., Potvin, J., Rousseau, J.: An exact constraint logic pro-
    gramming algorithm for the traveling salesman problem with time windows. Trans-
    portation Science 32(1), 12–29 (1998)
31. Provost, F.J., Fawcett, T.: Robust classification for imprecise environments. Ma-
    chine Learning 42(3), 203–231 (2001)
32. Régin, J.: A filtering algorithm for constraints of difference in CSPs. In: Hayes-
    Roth, B., Korf, R.E. (eds.) Proceedings of the 12th National Conference on Ar-
    tificial Intelligence, Seattle, WA, USA, July 31 - August 4, 1994, Volume 1. pp.
    362–367. AAAI Press / The MIT Press (1994)
33. Schimpf, J., Shen, K.: Ecli pse - from LP to CLP. TPLP 12(1-2), 127–156 (2012)
34. Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning
    Tools and Techniques. Morgan Kaufmann Series in Data Management Systems,
    Morgan Kaufmann, 3 edn. (2011)