Data-driven Policy on Feasibility Determination
        for the Train Shunting Problem
             (Extended Abstract) ?

         Paulo Roberto de Oliveira da Costa1 , Jason Rhuggenaath1 ( ),
        Yingqian Zhang1 , Alp Akcay1 , Wan-Jui Lee2 , and Uzay Kaymak1
                        1
                            Eindhoven University of Technology
                                   2
                                     Dutch Railways

Introduction. Parking, matching, scheduling, and routing are common problems
in train maintenance. In particular, train units are commonly maintained and
cleaned at dedicated shunting yards. The planning problem that results from
such situations is referred to as the Train Unit Shunting Problem (TUSP). This
problem involves matching arriving train units to service tasks and determining
the schedule for departing trains. The TUSP is an important problem as it is
used to determine the capacity of shunting yards and arises as a sub-problem of
more general scheduling and planning problems. In this paper, we consider the
case of the Dutch Railways (NS) TUSP. As the TUSP is complex, NS currently
uses a local search (LS) heuristic to determine if an instance of the TUSP has
a feasible solution. Given the number of shunting yards and the size of the
planning problems, improving the evaluation speed of the LS brings significant
computational gain. In this work, we use a machine learning approach that
complements the LS and accelerates the search process. We use a Deep Graph
Convolutional Neural Network (DGCNN) model to predict the feasibility of
solutions obtained during the run of the LS heuristic. We use this model to decide
whether to continue or abort the search process. In this way, the computation
time is used more efficiently as it is spent on instances that are more likely to be
feasible. Using simulations based on real-life instances of TUSP, we show how
our approach improves upon the previous method on prediction accuracy and
leads to computational gains for the decision-making process.
Methodology. Our methodology consists of three main steps: (i) generation
of starting solutions; (ii) a prediction model for feasibility of solutions and (iii)
a policy for interation with the LS. A schematic view of our approach can be
found in [1]. Each intermediate solution is represented as an activity graph of
maintenance activities. We then use a Deep Graph Convolutional Neural Network
(DGCNN) [2] as a feature extractor to train a model that predicts the future
feasibility of each solution. Afterwards, we combine the predictions with the LS
to derive a policy that decides to terminate the LS run based on the sequence of
intermediate solutions seen so far.
Main results and insights. We evaluate the classification performance of the
graph classification models (Table 1). We consider three types of prediction
models: (i) DGCNN-IS, (ii) DGCNN-IS-T and (iii) DGCNN-MS-T. DGCNN-IS
?
    Copyright c 2019 for this paper by its authors. Use permitted under Creative
    Commons License Attribution 4.0 International (CC BY 4.0). The full paper can be
    found in [1].
uses only node labels and initial solutions during training. DGCNN-IS-T is the
same as DGCNN-IS but with additional temporal features. DGCNN-MS-T uses
additional node temporal features and the first 10% of the graphs in each instance
of a LS run for training. All models consider as class labels the final feasibility
status at the end of an LS run. We perform inference on the initial solutions on
the test dataset.
                                     Method                 DGCNN-IS                           DGCNN-IS-T               DGCNN-MS-T
                                     ACC(%)                 54.10 ± 2.37                       60.01 ± 1.67              59.96 ± 1.68
                                     TPR(%)                52.93 ± 11.47                        58.25 ± 9.10            82.08 ± 2.53
                                     TNR(%)                55.28 ± 10.27                       60.38 ± 7.37              37.86 ± 4.99

  Table 1. Different models. Adding time features improves performance by 10%.
    The main insight is that temporal features add important information for the
classification of feasible instances. These results motivate us to consider a variant
DGCNN-MS-T-W of DGCNN-MS-T that can be used alongside the local search
procedure. DGCNN-MS-T-W is similar to DGCNN-MS-T except that it predicts
feasibility of solutions that are W iterations ahead. Given an instance of TUSP,
we use the following threshold policy (see [1] for more details):
 1. Start the LS with the initial solution and run it for K iterations.
 2. For each iteration i where 0 ≤ i ≤ K, we pass the current solution Gi to
    DGCNN-MS-T-W to get a feasibility score φ(Gi ).
 3. If the average of the feasibility scores φ(Gi ) seen up to iteration K falls below
    some threshold 0 < αIF < 1, we stop the LS and classify the instance as
    infeasible.
                                                                                                                               Prediction Outcome
                             0.506                                               Infeasible                                 0: infeasible 1: feasible
                                                                                                                                                        Correct
                                                                                 Feasible                                                               Incorrect
                             0.504
                             0.502                                                                                                                      68.4%
         Feasibility Score


                                                                                                                             543           250
                                                                                                                        0
                             0.500                                                                                           33.9%         15.6%        31.6%
                                                                                                         Actual
                                                                                                         Value


                             0.498
                             0.496
                                                                                                                             284           523          64.8%
                             0.494                                                                                      1
                                                                                                                             17.7%         32.7%        35.2%
                             0.492
                                                                                                                  Correct      65.6%        67.6%       66.6%
                                     0   50   100    150 200 250           300   350     400
                                                    Number of Iterations                                      Incorrect        34.4%        32.4%       33.4%


     Fig. 1. Moving average over the last 10                                                           Table 2. Confusion matrix of
     iterations of the feasibility scores φ(Gi )                                                       one of the folds for the final
     averaged over cross-validation runs of the                                                        classification of DGCNN-MS-T-
     LS procedure                                                                                      W, with W = 150, K = 200,
                                                                                                       αIF = 0.5.

   Our results (see also Fig. 1 and Table 2) indicate that we can achieve a 8%
reduction in running time for a single instance, which can account for roughly 20
hours in total real time.
References
1. de O. da Costa, P.R., Rhuggenaath, J., Zhang, Y., Akcay, A., Lee, W., Kaymak, U.:
   Data-driven policy on feasibility determination for the train shunting problem. In:
   Machine Learning and Knowledge Discovery in Databases - European Conference,
   ECML PKDD (2019), to appear
2. Zhang, M., Cui, Z., Neumann, M., Chen, Y.: An end-to-end deep learning architecture
   for graph classification. In: AAAI-18. pp. 4438–4445 (2018)