<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jean-Yves Lucas</string-name>
          <email>jean-yves.lucas@edf.fr</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thomas Triboulet</string-name>
          <email>thomas.triboulet@edf.fr</email>
        </contrib>
      </contrib-group>
      <abstract>
        <p>Nested Rollout Policy Adaptation (NRPA) is a Monte Carlo search algorithm that learns a playout policy in order to solve a single player game. In this paper we apply NRPA to the vehicle routing problem. This problem is important for large companies that have to manage a fleet of vehicles on a daily basis. Real problems are often too large to be solved exactly. The algorithm is applied to standard problem of the literature and to the specific problems of EDF (Electricite´ De France, the main french electric utility company). These specific problems have peculiar constraints. NRPA gives better result than the algorithm previously used by EDF.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Monte Carlo Tree Search (MCTS) has been successfully applied to
many games and problems [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        Nested Monte Carlo Search (NMCS) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] is an algorithm that
works well for puzzles. It biases its playouts using lower level
playouts. At level zero NMCS adopts a uniform random playout
policy. Online learning of playout strategies combined with NMCS has
given good results on optimization problems [38]. Other
applications of NMCS include Single Player General Game Playing [32],
Cooperative Pathfinding [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], Software testing [36], heuristic
ModelChecking [37], the Pancake problem [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], Games [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], Cryptography
[
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] and the RNA inverse folding problem [33].
      </p>
      <p>
        Online learning of a playout policy in the context of nested
searches has been further developed for puzzles and optimization
with Nested Rollout Policy Adaptation (NRPA) [40]. NRPA has
found new world records in Morpion Solitaire and crosswords
puzzles. Stefan Edelkamp and co-workers have applied the NRPA
algorithm to multiple problems. They have optimized the algorithm
for the Traveling Salesman with Time Windows (TSPTW) problem
[
        <xref ref-type="bibr" rid="ref12 ref21">12, 21</xref>
        ]. Other applications deal with 3D Packing with Object
Orientation [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], the physical traveling salesman problem [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ], the
Multiple Sequence Alignment problem [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] or Logistics [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. The
principle of NRPA is to adapt the playout policy so as to learn the best
sequence of moves found so far at each level.
      </p>
      <p>Electricite´ De France (EDF) is the main french electric utility
company. Each year, eleven millions of services have to be carried
out by EDF technicians at customers places (problem fixing, meter
replacement, energetic diagnosis, business prospects). Thus, all
together, EDF techicians drive by car more than 220 000 kilometers
per year. Each service is accomplished by a technician having the
required skills within a time window determined during the
appointment booking. As a result, numerous Vehicle Routing Problems with
Time Windows (VRPTW) have to be solved on a daily basis (one
problem per catchment area). Any of these VRPTW involves quite a
big search space, since EDF technicians of a single catchment area
have to achieve every day hundreds of visits at customers places. The
objective of the work described in this paper was to adapt the NRPA
algorithm to the specifics of EDF problem.</p>
      <p>
        Related approaches that apply Artificial Intelligence techniques
to transportation include agent based approaches [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], Monte Carlo
Search approaches [
        <xref ref-type="bibr" rid="ref1 ref22">42, 30, 22, 1</xref>
        ] and learning of simulations of
urban traffic [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Our work is close to the Monte-Carlo Search
approach applied to the Capacitated Vehicle Routing Problem with
Time Windows (CVRPTW).
      </p>
      <p>We now give the outline of the paper. The next section briefly
describes the state of the art of Vehicle Routing. The third section
details the NRPA algorithm and its proposed improvements. The fourth
section describe the problems of EDF and the current solver. The fifth
section gives experimental results for various instances of Vehicle
Routing, both from literature (Solomon instances) and from real-life
problems. The last section concludes.
2</p>
    </sec>
    <sec id="sec-2">
      <title>The Vehicle Routing Problem</title>
      <p>
        The Vehicle Routing Problem (VRP) is one of the most studied
optimization problem. It was first stated in 1959 in [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. Basically, it
consists of finding an optimal route for a number of vehicles that are
used to deliver goods or services to a set of customers, taking into
account a set of constraints. In the simpler variant of the problem, all
vehicles start from a single depot and end their tour in this depot. The
objective function to be minimized may combine up to three criteria
(namely the number of customers that are not serviced, the number
of vehicles used, the total distance travelled by the vehicles), each
criterion having an associated weight. The VRP can be formalized
as a graph problem : let G = (V, A) be a directed graph where V
is the set of vertices, and A is the set of arcs. Each arc is labelled
with a non-negative number. One of the vertices represent the depot,
the others represent the customers locations. The arcs symbolize the
roads between two customers, and the label of the arcs give the
distance (or the travel time, or the travel cost) between two points. Lets
us remind that the G is a directed graph, and thus its associated
distance matrix is not necessarily symmetric.Now the problem consists
of finding the minimum number of tours so that each vertex except
the depot belongs to one and only one tour, and the depot belongs
to all of them. The VRP is of the non deterministic polynomial time
hard type (NP-Hard), which implies that so far we don’t know of
a general method which is able to optimally solve any instance of
the problem in polynomial time. As a result, the VRP is considered
very difficult. Nowadays exact methods can solve to optimality quite
challenging instances of the problem (for instance up to one hundred
visits or so with Branch and Price methods).
2.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>VRP variations</title>
      <p>The VRP is a real challenge for delivery companies operating a
fleet of vehicles. It has given rise to a number of variations. The
CVRP (Capacitated VRP) is defined as a VRP with a demand
associated to each customers (e.g. the number of parcels they have
purchased) and each vehicle has a limited capacity. The VRP with time
windows (VRPTW) implies to serve each customer within a given
time window (possibly different for each customer). The capacitated
VRP with time windows (CVRPTW) combines the characteristics
of the two previous variants. Although the previous variations are
the most widely studied because of their great practical importance,
many other extensions of the basic VRP have also been proposed.
Among them, let us just mention two of them. First, the dynamic
VRP (DVRP) where vehicles may be dynamically re-routed during
their tour in order to fulfill new customers orders. Second, the VRP
with Pickup and Delivery (VRPPD) where a fleet of vehicles must
satisfy transportation requests (e.g. picking parcels up at some places
and delivering them at given locations).
2.2</p>
    </sec>
    <sec id="sec-4">
      <title>Principal methods used for solving VRP</title>
      <p>A wide range of methods have been used to solve the VRP. These
methods may be broken down into three sub-groups, namely exact
methods, heuristic methods and metaheuristic methods. These
methods are presented hereafter.</p>
      <p>
        Exact methods These methods tend to find an optimal solution for
the VRP. As mentioned above, they are thus used to solve instances
of the VRP of consequent size. Among the most studied exact
methods for the VRP and variations we may mention the Branch-and-Cut
and the Branch-and-Price algorithms [28], the column-generation
algorithm [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and the set-partitioning method [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        These methods can also provide a bound of optimum, especially
in relaxing some of the most complicated constraints. For example,
in suppressing the classical subtour elimination constraint.
Heuristic methods Not in a comprehensive manner, let us
mention two interesting approaches. First the cluster-first, route-second
heuristic [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]. Second, the savings heuristic [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. For the
local search approaches we see in [44] a heuristic projection pool
with powerful insertion and guided local search strategies. The
local search described in [39] gives reference solutions on several
instances of the benchmark VRP Solomon instances tested in this
work.
      </p>
      <p>
        Metaheuristic methods The most commonly used metaheuristics
for solving VRP and variations are the particle swarm optimization
[31], simulated annealing [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], genetic algorithm [34] [43] and tabu
search [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] [35]. Among the best algorithms for solving the VRP
Solomon instances tested in this work, we can cite genetic algorithm
described in [29], evolutionnary algorithm detailed in [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] or tabu
Search proposed in [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-5">
      <title>Nested Rollout Policy Adaptation for Vehicle</title>
    </sec>
    <sec id="sec-6">
      <title>Routing</title>
      <p>In this section we start with explaining the NRPA algorithm. Then
we give our modelling of the CVRPTW problem for NRPA. We
finish the section describing how weight are heuristically initialized in
order to speed-up convergence of the algorithm.
3.1</p>
    </sec>
    <sec id="sec-7">
      <title>Description of NRPA</title>
      <p>
        An effective combination of nested levels of search [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and of policy
learning has been proposed with the NRPA algorithm [40]. NRPA
holds world records for Morpion Solitaire and crosswords puzzles.
      </p>
      <p>NRPA is given in algorithm 3. The principle is to learn weights
for the possible actions so as to bias the playouts. The playout
algorithm is given in algorithm 1. It performs Gibbs sampling, choosing
the actions with a probability proportional to the exponential of their
weights.</p>
      <p>The adaptive rollout policy is a policy parameterized by weights
on each action. During the playout phase, action is sampled
according to this weights. The playout algorithm is given in algorithm 1. It
uses Gibbs sampling, each move is associated to a weight. A move
is coded as an integer that gives the index of its weight in the policy
array of floats. The algorithm starts with initializing the sequence of
moves that it will play (line 2). Then it performs a loop until it reaches
a terminal states (lines 3-6). At each step of the playout it calculates
the sum of all the exponentials of the weights of the possible moves
(lines 7-10) and chooses a move proportional to its probability given
by the softmax function (line 11). Then it plays the chosen move and
adds it to the sequence of moves (lines 12-13).</p>
      <p>Then, the policy is adapted on the best current sequence found,
by increasing the weight of the best actions. The Adapt algorithm is
given in algorithm 2. The weights of the actions are updated at each
step of the algorithm so as to favor moves of the best sequence found
so far at each level. The principle of the adaptation is to add to
the action of the best sequence for each state encountered in the best
sequence (lines 3-5) and to decrease the weight of the other
possible actions by an amount proportional to their probabilities of being
played (lines 6-12). The adaptation algorithm is given in algorithm
2.</p>
      <p>In NRPA, each nested level takes as input a policy, and returns a
sequence. Inside the level, the algorithm makes many recursive calls
to lower levels, providing weights, getting sequences and adapting
the weights on those sequences. In the end, the algorithm returns the
best sequence found in that level. At the lowest level, the algorithm
simply makes a rollout.</p>
      <p>The NRPA algorithm is given in algorithm 3. At level zero it
simply performs a playout (lines 2-3). At greater levels it performs N
iterations and for each iteration it calls itself recursively to get a score
and a sequence (lines 4-7). If it finds a new best sequence for the
level it keeps it as the best sequence (lines 8-11). Then it adapts the
policy using the best sequence found so far at the current level (line
12).</p>
      <p>NRPA balances exploitation by adapting the probabilities of
playing moves toward the best sequence of the level, and exploration by
using Gibbs sampling at the lowest level. It is a general algorithm
that has proven to work well for many optimization problems.</p>
      <p>
        Playout policy adaptation has also been used for games such as Go
[27] or various other games with success [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
3.2
      </p>
    </sec>
    <sec id="sec-8">
      <title>Modelling of the problem</title>
      <p>There are several design choices when implementing NRPA. The first
choice is to decide how to code the possible moves. For the vehicle
routing problem we choose to code a move as a starting node, and
arrival node and a vehicle.More precisely, a solution to the problem
is an ordered sequence of visits, including special visits (SV) that
represent the fact that the corresponding technician is at the depot.
So a sequence solution always starts and ends with a SV. If there
Algorithm 1 The playout algorithm
1: playout (state, policy)
2: sequence []
3: while true do
4: if state is terminal then
5: return (score (state), sequence)
6: end if
7: z 0.0
8: for m in possible moves for state do
9: z z + exp (policy [code(m)])
10: end for
11: choose a move with probability exp(policy[czode(move)])
12: state play (state, move)
13: sequence sequence + move
14: end while
Algorithm 2 The Adapt algorithm
1: Adapt (policy, sequence)
2: polp policy
3: state root
4: for move in sequence do
5: polp [code(move)] polp [code(move)] +
6: z 0.0
7: for m in possible moves for state do
8: z z + exp (policy [code(m)])
9: end for
10: for m in possible moves for state do
11: polp [code(m)] polp [code(m)]
exp(policy[code(m)])</p>
      <p>z
is a SV between two visits within the sequence, that means the end
of a tour for a technician, and the beginning of a tour for another
technician (with no chronological ordering of these two tours that
will be carried out simultaneously).</p>
      <p>Another design choice is to define the score of a playout. Score
includes number of non visited customers multiplied by a great
penalization number, number of used vehicles multiplied by an rather
great weight, and number of kilometers. We filter movements that do
not respect the time windows in the playout algorithm : all solutions
respect customers time windows and vehicle time windows.</p>
      <p>If there were more objectives, a lexicographic ordering of the
objectives would be the best way to represent the scores of a playout.
Playouts could then be compared simply and easily.
3.3</p>
    </sec>
    <sec id="sec-9">
      <title>Initialization of the Weights</title>
      <p>In NRPA weights are uniformly initialized to 0.0. We propose to
initialize the weight of NRPA heuristically in order to speedup
convergence. The greater the distance from the current city to another city
the less likely the other city is a good choice. Standard NRPA starts
with an uniform policy with all weights set to 0.0 and does not
distinguishes between close and far cities. The heuristic initialization
of the weights initializes the weight with a value proportional to the
inverse of the distance. This initialization is only used for the first
iteration of NRPA.</p>
      <p>Algorithm 4 The NRPA algorithm with quantiles.
1: NRPA with quantile (level, policy)
2: allScores []
3: quantile 1
4: if level == 0 then
5: (result,new) playout(root,policy)
6: allScores allScores + result
7: quantile updateQuantile(allScores)
8: return (result,new)
9: else
10: bestScore 1
11: for N iterations do
12: (result,new) NRPA(level
13: if result bestScore then
14: bestScore result
15: seq new
16: else
17: if result quantile then
18: policy Adapt Bad(policy, new)
19: end if
20: end if
21: policy Adapt(policy, seq)
22: end for
23: return (bestScore, seq)
24: end if
1, policy)
3.4</p>
    </sec>
    <sec id="sec-10">
      <title>The Quantile Heuristic</title>
      <p>The objective of the Quantile Heuristic is to penalize movements
from bad solutions. Solution scores for all playouts need to be stored.
When a new playout is computed, if its score is worse than a defined
quantile, then its movements weights are decreased in the weight.
Efficient implementation is needed to limit the increase of computation
time needed to sort the solution scores and compute the quantile.
Sequence of algorithm is described in 4. The adapt function is the same</p>
    </sec>
    <sec id="sec-11">
      <title>The EDF Capacitated Vehicle Routing Problem with Time Windows</title>
      <p>In this section we first explain the optimization problem encountered
at EDF and then describe the current EDF approach to the problem.
4.1</p>
    </sec>
    <sec id="sec-12">
      <title>Description of the EDF problems</title>
      <p>The Vehicle Routing Problem modeled here includes time windows,
which is a classical feature of VRP problems. That means that :
Each technician has an availability time window. Each tour starts
at the beginning of this time window and must end before the end
of this time window.</p>
      <p>Each appointment has a time window and a duration. A tour
visiting the appointment must start after the beginning of the time
window and end (including the appointment duration) before the
end of the time window
If a tour arrives at an appointment location before the beginning of
its time windows, it is possible to add waiting time before starting
the appointment.</p>
      <p>Time windows and technicians are input data of the problem.
Duration of appointments and duration of all possible trajects between
appointments or depot and appointments are also input data of the
problem.</p>
      <p>The problem also includes capacities, another classical feature of
VRP problems : Each vehicle has several stock capacities (in EDF
problem 6 per vehicle).</p>
      <p>Each vehicle has several stock capacities (in EDF problem 6 per
vehicle). Vehicle stocks are initialized with initial capacity at the
begining of each tour.</p>
      <p>For any stock, each operation will consume a stock quantity.
For each tour, for each capacity, the sum of the quantity used by
the operation of the tour must not be greater than the capacity of
the stock.</p>
      <p>Several peculiar features are taken into account in EDF
modelization.
as for good solutions, with negative coefficient , as described in 5.
The coefficient used for bad solutions can be different from the one
used to adapt policy to good solutions.</p>
      <p>Taking into account an off-duty time window for lunch break. No
place is imposed for it. It can interrupt a trip but not a service to a
customer.</p>
      <p>Taking into account specific skills for visits. For each visit, only
a subset of technicians is skilled to carry out the technical
operations.</p>
      <p>Traject distance and traject duration are not proportionnal,
because vehicle mean speed depends on the traject. Consequently,
2 different traject matrixs are provided.</p>
      <p>Traject matrix and are not symetric : traject between two points
depends on the sense of the traject.</p>
      <p>Another difference with academic data is that on real problems,
there is not always enough people to carry out the whole set of
visits. Consequently, we must first maximize the number of visits that
will be done. Some of the appointments have a high priority, and
because of customers satisfaction, it is not possible to cancel it. Other
appointments have less priority, because they consist for example in
a technical operation that can be postponed, with no corresponding
customer appointment. Moreover, network preventive maintenance
visits have less priority than troubleshooting activities. In order to
evaluate a solution, a lexicographic objective function is taken into
account :</p>
      <p>Maximization of the number of achieved visits of high priority.
Maximization of the number of achieved visits of low priority.
Minimization of the economic function, taking into account
number of technicians used (ponderated with a proportional daily
wage) and number of kms (ponderated with a proportional km
cost).</p>
      <p>Thus, when comparing two solutions, we first compare the number
of visits of high priority that are achieved in each solution.if these
numbers are not equal then we know which solution is best. If they
are equal, then we compare the number of visits of low priority, if
again these numbers are equal, then we compare the third criterion
of the two solutions, that is the economic function. The problem is
solved each day for determining the technicians tours of the next day,
with a computing time which must not last more than an few hours.
4.2</p>
    </sec>
    <sec id="sec-13">
      <title>Description of current EDF approach</title>
      <p>Current approach for the problem is based on stochastic greedy
algorithm and variable neighborhood search. The stochastic greedy
algorithm is based on [41] algorithm. First phase consist in creating tours
and insert in each tour the visits that increases the less the travelled
distance, until there is no room for new visits. For each insertion,
the place in the tour is chosen to minimize the increase of kms. If
no insertion is possible, a new round is created. Then the process is
repeated, until no visits nor technicians are left.</p>
      <p>The local search consist of 2 kinds of movements :
Small movements consist in choosing randomly an appointment,
and finding the best place to move the appointment, i.e. the best
place in all the existing tours to reinsert the appointment.
Large movements consist in choosing randomly a tour, then
entirely destroying it and trying to reinsert all its appointments in
the other tours. If no improvement is possible, the movement is
canceled.</p>
      <p>Advantages of such a method is that it computes rapidly good
solutions, very often acceptable by operators because greedy algorithm
choices, based on distance, are humanly intuitive.</p>
    </sec>
    <sec id="sec-14">
      <title>Experimental Results</title>
      <p>The operational process of EDF consists in providing a solution to
the CVRPTW in less than two hours. Each problem is solved during
the night so as to provide solutions for the next day. The parameters
used in our experiments for NRPA are to use level 3, = 1 and 100
iterations per level. This algorithm is called nrpa in our tables. The
results are given out of 11 independent runs. The distance
heuristic is tested and the initialization of the weights is only applied at
the first iteration. The corresponding name of the algorithm is nrpaD
in the tables. The Quantile heuristic is applied in 80% of the worst
cases. It uses all the simulations performed to date and uses a specific
= 0:5. The algorithm that uses both the Distance and the Quantile
heuristics is named nrpaDQ int the tables.</p>
      <p>The current solver of EDF, Opturn is launched only once with
3 000 iterations of greedy search and 3 000 iterations of local search.
These values were chosen during the tuning of Opturn after many
experiments that showed that no improvements are obtained with
bigger values. As a result, the running time of Opturn is smaller than the
running time of the NRPA.</p>
      <p>The running time for NRPA is approximately 7 000 seconds and
up to 8 000 seconds when using the Quantile heuristic. The running
time of Opturn is approximately 5 000 seconds. Importantly, these
run times are compatible with the operational process as they last
approximately two hours.</p>
      <p>Table 1 gives the results of runs on the litterature instances. It
compares 4 algorithms:
nrpa : Standard NRPA (3 levels) without heuristics
nrpaD : NRPA (3 levels) with Distance initialization heuristic
nrpaDQ : NRPA (3 levels) with Distance initialization heuristic
and Quantile heuristic
opturn : current EDF solver, based on Solomon and LNS heuristic.
To compare the results of the 4 variants, the lexicographical approach
takes into account first the number of vehicles used and then the
kilometers.</p>
      <sec id="sec-14-1">
        <title>V : number of needed vehicles</title>
        <p>Km : sum of the Kms of the rounds</p>
        <p>Table 2 gives the summary for all the runs of the algorithms on the
standard problems. For the Km and the Vehicles (V), the figures are
the average of each algorthm on the Solomon instances. The value
is the number of problems that are solved better than with standard
NRPA minus the number of problems that are solved worse.
Comparison is lexicographic, based on number of vehicles, and if equals,
on the number of Kms. Logically, the value is 0 for the reference
(nrpa variant).</p>
        <p>We observe that nrpaDQ scores 55 while Opturn scores -50. This
is a great improvement over the current solver. However the current
solver was optimized on the EDF problems. The Distance heuristic
improves a lot over standard NRPA. The Distance + Quantile
heuristic only improves slightly over the Distance heuristic alone.</p>
        <p>Table 3 and Table 4 give the results of the 4 algorithms on the
EDF instances. For each algorithm, the 3 components of objective
function are displayed:</p>
      </sec>
      <sec id="sec-14-2">
        <title>Missed : number of missed appointments</title>
        <p>V : number of needed vehicles
Km : sum of the Kms of the tours</p>
        <p>Table 3 compares the results of runs of the algorithms on the EDF
problems. For each instance we compare the number of realized
technical operations because on real EDF instances there are not
necessarily enough vehicles to achieve all the operations. We also measure
the number of vehicles and the number of kilometers, as in classical
VRP problem instances. To compare the results of the 4 variants, the
lexicographical approach takes into account first the achieved
operations, then the number of vehicles used and finally the kilometers.</p>
        <p>Table 4 gives the average values on the whole set of EDF instances.</p>
        <p>The average number of missed visits is smaller with NRPA than
with Opturn. It is the same for all NRPA variants. The number of
vehicles used for the operations is also slightly smaller and this is
the same for all NRPA variants. The average number of kilometers is
greater with standard NRPA than with Opturn, however it is smaller
with nrpaD and nrpaDQ than with Opturn. With respect to the
lexicographic objective function described in subsection 4.1, we can
conclude that nrpaD and nrpaDQ improves on the current solver for the
operational EDF instances.</p>
        <p>The Distance and Quantile heuristics provide solutions with the
same values for the 3 components of the objective function. These
two heuristics perform better in average than the standard NRPA
variant as for the number of travelled kilometers : 324 vs 363, that
is a 10 percents improvement. If we compare with Opturn results,
the improvement account for around 5 percents. This is still
significant from a practical point of view because of the huge amount of
kilometers travelled by the technicians on a yearly basis : about 220
millions of kilometers.
6</p>
      </sec>
    </sec>
    <sec id="sec-15">
      <title>Conclusion</title>
      <p>We have presented the NRPA algorithm and its application to the
CVRPTW problems. This problem is important for EDF, a company
that plans numerous operations every day on the french electrical
network. We have given the modelization used to address the CVRPTW
problem and two heuristics to improve on standard NRPA: the
Distance heuristic and the Quantile heuristic. The Distance heuristic
improves a lot on standard NRPA and the Quantile heuristic is a slight
improvement. We also compared NRPA with the heuristics to the
current EDF solver Opturn. On standard instances NRPA with the
heuristic is much better. On the operational EDF instances it is still
better even though Opturn was tuned for these instances while NRPA
is a general algorithm that uses a Distance heuristic that works for all
kinds of VRP problems.</p>
      <p>As we have seen from the examples of the EDF instances, NRPA
with heuristics performs better than the current solver. The
percentage of kilometers saved by heuristic NRPA is greater than 5% of the
total number of kilometers. EDF agents drive hundreds of millions of
kilometers each year. The use of heuristic NRPA could save millions
of kilometers each year and reduce the carbon footprint of EDF by
hundreds of tons of CO2.
hicle routing’, Networks, 11(2), 109–124, (1981).
[27] Tobias Graf and Marco Platzner, ‘Adaptive playouts in monte-carlo tree
search with policy-gradient reinforcement learning’, in Advances in
Computer Games - 14th International Conference, ACG 2015, Leiden,
The Netherlands, July 1-3, 2015, Revised Selected Papers, pp. 1–11,
(2015).
[28] G. Gutierrez-Jarpa, G. Desaulniers, G. Laporte, and Marianov V., ‘A
branch-and-price algorithm for the vehicle routing problem with
deliveries, selective pickups and time windows’, European Journal of
Operational Research, 206(12), 341–349, (2010).
[29] M. Barkaoui J. Berger and O. Bra¨ysy, ‘A parallel hybrid genetic
algorithm for the vehicle routing problem with time windows’, in Working
paper, Defense Research Establishment Valcartier, Canada, (2001).
[30] Jacek Ma n´dziuk and Cezary Nejman, ‘Uct-based approach to
capacitated vehicle routing problem’, in International Conference on
Artificial Intelligence and Soft Computing, pp. 679–690. Springer, (2015).
[31] Y. Marinakis, G-R. Iordanidou, and M. Marinaki, ‘Particle swarm
optimization for the vehicle routing problem with stochastic demands’,
Applied Soft Computing, 13, 1693–1704, (2013).
[32] Jean Me´hat and Tristan Cazenave, ‘Combining UCT and Nested Monte
Carlo Search for single-player general game playing’, IEEE
Transactions on Computational Intelligence and AI in Games, 2(4), 271–277,
(2010).
[33] Fernando Portela, ‘An unexpectedly effective monte carlo technique for
the rna inverse folding problem’, bioRxiv, 345587, (2018).
[34] J.-Y. Potvin and S. Bengio, ‘The vehicule routing problem with time
windows part ii : Genetic search’, INFORMS Journal on Computing,
8(2), 165–172, (1996).
[35] J.-Y. Potvin, T. Kervahut, B.-L. Garcia, and J.-M. Rousseau, ‘The
vehicule routing problem with time windows part i : Tabu search’,
INFORMS Journal on Computing, 8(2), 158–164, (1996).
[36] Simon M. Poulding and Robert Feldt, ‘Generating structured test data
with specific properties using nested monte-carlo search’, in Genetic
and Evolutionary Computation Conference, GECCO ’14, Vancouver,
BC, Canada, July 12-16, 2014, pp. 1279–1286, (2014).
[37] Simon M. Poulding and Robert Feldt, ‘Heuristic model checking using
a monte-carlo tree search algorithm’, in Proceedings of the Genetic and
Evolutionary Computation Conference, GECCO 2015, Madrid, Spain,
July 11-15, 2015, pp. 1359–1366, (2015).
[38] Arpad Rimmel, Fabien Teytaud, and Tristan Cazenave, ‘Optimization
of the Nested Monte-Carlo algorithm on the traveling salesman
problem with time windows’, in Applications of Evolutionary
Computation - EvoApplications 2011: EvoCOMNET, EvoFIN, EvoHOT,
EvoMUSART, EvoSTIM, and EvoTRANSLOG, Torino, Italy, April 27-29,
2011, Proceedings, Part II, volume 6625 of Lecture Notes in Computer
Science, pp. 501–510. Springer, (2011).
[39] Y. Rochat and E.D. Taillard, ‘Probabilistic diversification and
intensification in local search for vehicle routing’, in Journal of Heuristics,1,
pp. 147–167, (1995).
[40] Christopher D. Rosin, ‘Nested rollout policy adaptation for Monte</p>
      <p>Carlo Tree Search’, in IJCAI, pp. 649–654, (2011).
[41] Solomon, ‘Algorithms for the vehicle routing and scheduling problems
with time window constraints’, in Operations Research, (1985).
[42] Kenneth S o¨rensen and Marc Sevaux, ‘A practical approach for robust
and flexible vehicle routing using metaheuristics and monte carlo
sampling’, Journal of Mathematical Modelling and Algorithms, 8(4), 387,
(2009).
[43] S.R. Tangiah, K.E. Nygard, and Juell P.L., ‘Gideon : A genetic
algorithm system for vehicule routing with time window’, in IEEE CAIA
1991 - Proceedings of the 7th IEEE Conference on Artificial
Intelligence Applications, 24-28 February 1991, Miami beach, Florida, USA,
pp. 322–328, (1991).
[44] N. Yuichi and B. Olli, ‘A powerful route minimization heuristic for the
vehicle routing problem with time windows’, in Operations Research
Letters, Vol. 37, No. 5, pp. 333–338, (2009).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Ashraf</given-names>
            <surname>Abdo</surname>
          </string-name>
          , Stefan Edelkamp, and Michael Lawo, '
          <article-title>Nested rollout policy adaptation for optimizing vehicle selection in complex vrps'</article-title>
          ,
          <source>in 2016 IEEE 41st Conference on Local Computer Networks Workshops (LCN Workshops)</source>
          , pp.
          <fpage>213</fpage>
          -
          <lpage>221</lpage>
          . IEEE, (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Mathur</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.M.</given-names>
            <surname>Salkin</surname>
          </string-name>
          , '
          <article-title>A set-partitioning-based algorithm for the vehicle routing problem'</article-title>
          ,
          <source>Networks</source>
          ,
          <volume>19</volume>
          (
          <issue>7</issue>
          ),
          <fpage>731</fpage>
          -
          <lpage>749</lpage>
          , (
          <year>1989</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.P.</given-names>
            <surname>Anbuudayasankar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ganesh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.C.</given-names>
            <surname>Lenny Koh</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ducq</surname>
          </string-name>
          , '
          <article-title>Modified savings heuristics and genetic algorithm for bi-objective vehicle routing problem with forced backhauls'</article-title>
          ,
          <source>Expert Systems and Applications</source>
          ,
          <volume>30</volume>
          ,
          <fpage>2296</fpage>
          -
          <lpage>2305</lpage>
          , (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Azi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gendreau</surname>
          </string-name>
          , and J.-Y. Potvin, '
          <article-title>An exact algorithm for a vehicle routing problem with time windows and multiple use of vehicles'</article-title>
          ,
          <source>European Journal of Operational Research</source>
          ,
          <volume>202</volume>
          (
          <issue>3</issue>
          ),
          <fpage>756</fpage>
          -
          <lpage>763</lpage>
          , (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Ana</surname>
            <given-names>LC</given-names>
          </string-name>
          <article-title>Bazzan and Franziska Klu¨gl, 'A review on agent-based technology for traffic and transportation'</article-title>
          ,
          <source>The Knowledge Engineering Review</source>
          ,
          <volume>29</volume>
          (
          <issue>3</issue>
          ),
          <fpage>375</fpage>
          -
          <lpage>403</lpage>
          , (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Bruno</given-names>
            <surname>Bouzy</surname>
          </string-name>
          , '
          <article-title>Monte-carlo fork search for cooperative path-finding'</article-title>
          , in Computer Games - Workshop on Computer Games, CGW 2013,
          <article-title>Held in Conjunction with the 23rd</article-title>
          <source>International Conference on Artificial Intelligence, IJCAI</source>
          <year>2013</year>
          , Beijing, China,
          <source>August</source>
          <volume>3</volume>
          ,
          <year>2013</year>
          , Revised Selected Papers, pp.
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          , (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Bruno</given-names>
            <surname>Bouzy</surname>
          </string-name>
          , '
          <article-title>Burnt pancake problem: New lower bounds on the diameter and new experimental optimality ratios'</article-title>
          ,
          <source>in Proceedings of the Ninth Annual Symposium on Combinatorial Search, SOCS</source>
          <year>2016</year>
          ,
          <article-title>Tarrytown</article-title>
          ,
          <string-name>
            <surname>NY</surname>
          </string-name>
          , USA, July 6-
          <issue>8</issue>
          ,
          <year>2016</year>
          , pp.
          <fpage>119</fpage>
          -
          <lpage>120</lpage>
          , (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Cameron</given-names>
            <surname>Browne</surname>
          </string-name>
          , Edward Powley, Daniel Whitehouse, Simon Lucas,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Cowling</surname>
          </string-name>
          , Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis, and Simon Colton, '
          <article-title>A survey of Monte Carlo tree search methods'</article-title>
          ,
          <source>IEEE Transactions on Computational Intelligence and AI</source>
          in Games,
          <volume>4</volume>
          (
          <issue>1</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>43</lpage>
          , (
          <year>March 2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Tristan</given-names>
            <surname>Cazenave</surname>
          </string-name>
          , '
          <article-title>Nested Monte-Carlo Search'</article-title>
          , in IJCAI, ed.,
          <source>Craig Boutilier</source>
          , pp.
          <fpage>456</fpage>
          -
          <lpage>461</lpage>
          , (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Tristan</surname>
            <given-names>Cazenave</given-names>
          </string-name>
          , '
          <article-title>Playout policy adaptation with move features'</article-title>
          ,
          <source>Theor. Comput. Sci.</source>
          ,
          <volume>644</volume>
          ,
          <fpage>43</fpage>
          -
          <lpage>52</lpage>
          , (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Tristan</surname>
            <given-names>Cazenave</given-names>
          </string-name>
          , Abdallah Saffidine,
          <string-name>
            <surname>Michael John Schofield</surname>
          </string-name>
          , and Michael Thielscher, '
          <article-title>Nested monte carlo search for two-player games'</article-title>
          ,
          <source>in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12-17</source>
          ,
          <year>2016</year>
          , Phoenix, Arizona, USA, pp.
          <fpage>687</fpage>
          -
          <lpage>693</lpage>
          , (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Tristan</given-names>
            <surname>Cazenave</surname>
          </string-name>
          and Fabien Teytaud, '
          <article-title>Application of the nested rollout policy adaptation algorithm to the traveling salesman problem with time windows'</article-title>
          ,
          <source>in Learning and Intelligent Optimization - 6th International Conference, LION 6</source>
          , Paris, France,
          <source>January 16-20</source>
          ,
          <year>2012</year>
          , Revised Selected Papers, pp.
          <fpage>42</fpage>
          -
          <lpage>54</lpage>
          , (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>W.C.</given-names>
            <surname>Chiang</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Russel</surname>
          </string-name>
          , '
          <article-title>Simulated annealing metaheuristics for the vehicle routing problem with time windows'</article-title>
          ,
          <source>Annals of Operations Research</source>
          ,
          <volume>13</volume>
          (
          <issue>1</issue>
          ),
          <fpage>3</fpage>
          -
          <lpage>27</lpage>
          , (
          <year>1996</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>G.</given-names>
            <surname>Clarke</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Wright</surname>
          </string-name>
          , '
          <article-title>Scheduling of vehicles from a central depot to a number of delivery points'</article-title>
          ,
          <source>Operations Research</source>
          ,
          <volume>12</volume>
          ,
          <fpage>171</fpage>
          -
          <lpage>183</lpage>
          , (
          <year>1964</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>J-F. Cordeau</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Gendreau</surname>
          </string-name>
          , and G. Laporte, '
          <article-title>A tabu search heuristic for the periodic and multi-depot vehicle routing problems'</article-title>
          , Networks,
          <volume>30</volume>
          (
          <issue>2</issue>
          ),
          <fpage>105</fpage>
          -
          <lpage>119</lpage>
          , (
          <year>1997</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Luca</surname>
            <given-names>Crociani</given-names>
          </string-name>
          , Gregor La¨mmel, Giuseppe Vizzari, and Stefania Bandini, '
          <article-title>Learning obervables of a multi-scale simulation system of urban traffic</article-title>
          .',
          <string-name>
            <surname>in</surname>
            <given-names>ATT</given-names>
          </string-name>
          @ IJCAI, pp.
          <fpage>40</fpage>
          -
          <lpage>48</lpage>
          , (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>O.</given-names>
            <surname>Bra</surname>
          </string-name>
          ¨ysy
          <string-name>
            <given-names>D.</given-names>
            <surname>Mester</surname>
          </string-name>
          and
          <string-name>
            <given-names>W.</given-names>
            <surname>Dullaer</surname>
          </string-name>
          , '
          <article-title>A multi-parametric evolution strategies algorithm for vehicle routing problems'</article-title>
          , in Working Paper, Institute of Evolution, University of Haifa, Israel, (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>G.B.</given-names>
            <surname>Dantzig</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.H.</given-names>
            <surname>Ramser</surname>
          </string-name>
          , '
          <article-title>The truck dispatching problem'</article-title>
          ,
          <source>Management Science</source>
          ,
          <volume>6</volume>
          (
          <issue>1</issue>
          ),
          <fpage>80</fpage>
          -
          <lpage>91</lpage>
          , (
          <year>1959</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Ashutosh</given-names>
            <surname>Dhar</surname>
          </string-name>
          <string-name>
            <surname>Dwivedi</surname>
          </string-name>
          , Paweł Morawiecki, and Sebastian Wo´jtowicz, '
          <article-title>Finding differential paths in arx ciphers through nested monte-carlo search'</article-title>
          ,
          <source>International Journal of electronics and telecommunications</source>
          ,
          <volume>64</volume>
          (
          <issue>2</issue>
          ),
          <fpage>147</fpage>
          -
          <lpage>150</lpage>
          , (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>M. Gendreau F. Geurtin E. Taillard</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Badeau</surname>
            and
            <given-names>J.Y.</given-names>
          </string-name>
          <string-name>
            <surname>Potvin</surname>
          </string-name>
          , '
          <article-title>A tabu search heuristic for the vehicle routing problem with time windows'</article-title>
          ,
          <source>in Transportation Science</source>
          ,
          <volume>31</volume>
          , pp.
          <fpage>170</fpage>
          -
          <lpage>186</lpage>
          , (
          <year>1997</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Stefan</surname>
            <given-names>Edelkamp</given-names>
          </string-name>
          , Max Gath, Tristan Cazenave, and Fabien Teytaud, '
          <article-title>Algorithm and knowledge engineering for the tsptw problem'</article-title>
          ,
          <source>in Computational Intelligence in Scheduling (SCIS)</source>
          ,
          <source>2013 IEEE Symposium on</source>
          , pp.
          <fpage>44</fpage>
          -
          <lpage>51</lpage>
          . IEEE, (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Stefan</surname>
            <given-names>Edelkamp</given-names>
          </string-name>
          , Max Gath, Christoph Greulich, Malte Humann, Otthein Herzog, and Michael Lawo, '
          <article-title>Monte-carlo tree search for logistics'</article-title>
          ,
          <source>in Commercial Transport</source>
          ,
          <fpage>427</fpage>
          -
          <lpage>440</lpage>
          , Springer, (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Stefan</surname>
            <given-names>Edelkamp</given-names>
          </string-name>
          , Max Gath, and Moritz Rohde, '
          <article-title>Monte-carlo tree search for 3d packing with object orientation'</article-title>
          ,
          <source>in KI 2014: Advances in Artificial Intelligence</source>
          ,
          <fpage>285</fpage>
          -
          <lpage>296</lpage>
          , Springer International Publishing, (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>Stefan</given-names>
            <surname>Edelkamp</surname>
          </string-name>
          and Christoph Greulich, '
          <article-title>Solving physical traveling salesman problems with policy adaptation'</article-title>
          ,
          <source>in Computational Intelligence and Games (CIG)</source>
          ,
          <source>2014 IEEE Conference on</source>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . IEEE, (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>Stefan</given-names>
            <surname>Edelkamp</surname>
          </string-name>
          and
          <string-name>
            <given-names>Zhihao</given-names>
            <surname>Tang</surname>
          </string-name>
          , '
          <article-title>Monte-carlo tree search for the multiple sequence alignment problem'</article-title>
          , in Eighth Annual Symposium on Combinatorial Search, (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>M.</given-names>
            <surname>Fisher</surname>
          </string-name>
          and R. Jaikumar, '
          <article-title>A generalized assignment heuristic for ve-</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>