<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Graph Neural Networks and Deep Reinforcement Learning in Job Scheduling⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Maroš Bratko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thomas Seidelmann</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Holeňa</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Charles University in Prague</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Czech Academy of Sciences</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Czech Technical University in Prague</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Otto von Guericke University of Magdeburg</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Priority Dispatching Rules (PDRs) are greedy heuristic algorithms used to obtain approximate solutions to the NP-hard Job-shop Scheduling Problem (JSSP). The manual design of PDRs requires domain knowledge to achieve good performance, which varies with scenario properties and objectives. Recently, deep reinforcement learning has been used to automate the process of designing PDRs, where PDRs are formulated as a Markov Decision Process exploiting graph representation of JSSP, and a graph neural network (GNN) selects the operations to be dispatched. We experimentally compare five published models with source code publicly available on GitHub and our extensions of those models on three diferent variants of JSSP. Our experiments show that the choice of input features significantly afects the model's performance regardless of the GNN architecture. This suggests that the feature selection is essential for learning high-quality PDRs.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;job-shop scheduling</kwd>
        <kwd>graph neural networks</kwd>
        <kwd>deep reinforcement learning</kwd>
        <kwd>markov decision process</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>applying graph neural networks (GNNs) on a graph
representation of job scheduling problems.</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>Job scheduling considers the allocation of jobs to
resources with the goal of minimizing the makespan. Each This article considers five models published in
literajob consists of a sequence of operations, and each re- ture with source code publicly available on GitHub. Three
source can process only one operation at a time [1]. of those models solve JSSP, and two of them solve FJSP.</p>
      <p>Generally, the Job-shop Scheduling Problem (JSSP) as- We present our extensions of JSSP models to DJSP and
exsumes all jobs to be known apriori and each operation perimentally compare their performance on public
benchcan be allocated only to one given machine [1]. In a Flex- marks for JSSP and FJSP. To compare our DJSP extensions,
ible Job-shop Scheduling Problem (FJSP), each operation we model the DJSP as a Poisson process and generate
can be allocated to any of the machines from a given test instances from JSSP benchmarks.
subset of machines [2]. A Dynamic Job-shop
Scheduling Problem (DJSP), one of the more common variants,
tackles the stochastic aspects of modern manufacturing, 2. Problem formulation
e.g., the arrival of new jobs during the execution of the
schedule or uncertain processing times [3]. The Job-shop Scheduling Problem (JSSP) consists of a set</p>
      <p>Due to the NP-hardness of these problems [4], numer- of jobs  and a set of machines ℳ [1]. Each job has an
ous approaches and heuristics have been used over time associated sequence of operations  ∈ , which must
to yield approximate solutions, e.g. genetic algorithms be processed in the given order. Operation  represents
[5]. uninterrupted processing of job  ∈  on machine</p>
      <p>Priority Dispatching Rules (PDRs) [6] are a heuristic  ∈ ℳ with processing time  . Each machine can
method widely used in scheduling systems. Designing process only one operation at a time. A schedule is a set
a high quality PDR is usually a very time-consuming of start times  for each operation  that satisfies
task requiring extensive domain knowledge. Deep these constraints. The completion times  =  + 
reinforcement learning (DRL) has already been proposed denote the end of each operation. The JSSP solution
as a possible solution for automatizing algorithm is a schedule minimizing the total makespan max =
learning [7]. Several recent works have focused on max, { } [8].
extending this technique to job scheduling [8, 9, 10, 11],
Computational Intelligence and Data Mining - 11th international
workshop</p>
      <p>© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutRion W4.0oInrtekrnsahtioonpal (PCCroBYce4.0e).dings (CEUR-WS.org)</p>
      <sec id="sec-2-1">
        <title>2.1. Flexible Job-shop Scheduling</title>
      </sec>
      <sec id="sec-2-2">
        <title>Problem</title>
        <p>A flexible Job-shop Scheduling Problem (FJSP) is an
extended version of JSSP with the only diference being
that each operation  ∈  can be processed on any
machine  from a given subset of machines ℳ ⊆ ℳ
with processing time  [10]. Solving FJSP then consists
of selecting the appropriate machine for each operation
in addition to determining its start time.
a way that the resulting graph is a directed acyclic graph
(DAG) [1].</p>
        <sec id="sec-2-2-1">
          <title>2.3.1. FJSP as a disjunctive graph</title>
          <p>The only diference with respect to JSSP is that each
operation can be part of multiple cliques. An example
of disjunctive graph representation for FJSP is shown in
Figure 2.</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>2.4. Heterogeneous Graph Representation</title>
        <p>JSSP as a heterogenous graph is  = (, , ), where
 is a set of operation nodes,  is a set of machines
2.2. Dynamic Job-shop Scheduling nodes, and  is a set of edges [11].</p>
        <p>Problem edgEea,cmheadchgiencea-ntob-emeaicthheinreop(era− tion-t)oe-dogpee,raotrioonpe(ra−
tion)A dynamic Job-shop Scheduling Problem (DJSP) is a dy- to-machine ( −  ) edge.  −  edges fully connect
namic version of JSSP, where the jobs are not released all all operations in the same job, and all machines are fully
at once, but at diferent times throughout the execution. connected via  −  edges.  −  edge connects
opWe assume that  jobs are known at the beginning of the erations with machines on which they can be processed.
schedule and ′ jobs arrive after the start of the schedule
[12, 13]. 2.4.1. FJSP as a Heterogeneous Graph</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.3. Disjunctive Graph Representation</title>
        <p>JSSP can be represented as a disjunctive graph  =
(, , ), where  denotes a set of vertices
corresponding to diferent operations  weighted by the
processing time together with a start node  and a target node 
with processing time equal to zero, representing start and
end of the schedule, respectively [1].  is a set of arcs
representing precedence constraints.  = ⋃︀  is a set
of edges, where  is a clique connecting operations that
require the same machine  for their execution. An
example of a JSSP instance represented as a disjunctive
graph is shown in Figure 1.</p>
        <p>Finding a solution to the Job-shop Scheduling Problem
can be viewed as denfiing the ordering between
operations requiring the same machine. In the disjunctive
graph, this is done by turning all edges into arcs in such
FJSP as a heterogenous graph is defined as  =
(, , , Σ) [10]. Set of operation nodes  and set of
arcs  is the same as in the disjunctive graph. A set of
machine nodes  represents machines, and a set of edges Σ
connects operation nodes and machine nodes on which
they can be processed. An example of a heterogeneous
graph for FJSP is shown in Figure 3.</p>
      </sec>
      <sec id="sec-2-5">
        <title>2.5. Priority Dispatching Rules</title>
        <p>PDRs are a greedy heuristic method for solving JSSP in
|| steps [8]. For each eligible operation, PDR computes
a priority index and selects the one with the highest
priority to be dispatched. In FJSP, PDR also selects the machine.
Traditional PDRs compute the priority index based on
the set of features for each operation [13]. Traditional
PDRs include:
Decisions made by PDRs can then be viewed as actions
changing the disjunctive graph. This process can then
be formulated as a Markov Decision Process (MDP) [8],
allowing PDRs to be learned automatically via deep
reinforcement learning techniques. State  at timestep  is a
graph as described in Section 2.3 and Section 2.4. Action
 at timestep  is an unscheduled operation whose
preceding operations have already been scheduled. A state
transition from  to +1 after executing  is done by
updating the graph. The reward for each action is the
diference between the max() and max(+1), where
max() is the lower bound of the makespan in state
. The cumulative reward with discount factor  = 1 is
max(0)− max(||), where max(||) corresponds to
the makespan of the final schedule max, and max(0) is
a constant specific to the problem instance. Maximizing
cumulative reward minimizes final schedule makespan
max [8, 10, 11]. Each model presented in the next section
uses their own modified version of this MDP, which we
will not discuss in detail.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Models with source code</title>
      <p>In this section, we will briefly present five models from
the literature with available source code. These models
used GNNs and DRL to solve JSSP and FJSP. We named
each model after its GitHub repository. Source code of
all models, experiments, and data used in this paper can
be found on GitHub (link).</p>
      <sec id="sec-3-1">
        <title>3.1. JSSP Models</title>
        <p>L2D is a model published in [8], source code was
obtained from [15]. It employs a modified
disjunctive graph shown in Figure 4, where the authors
start with the original disjunctive graph containing
only conjunctive arcs, and after each dispatched
operation, they add a conjunctive arc representing
a new precedence constraint between operations on
the same machine. To train the model, the authors
used Proximal Policy Optimization (PPO) algorithm [16].
Wheatley is an open-source model published in
[17], with source code available on GitHub [18]. It uses a
similar "arc adding scheme" as L2D. To train the model,
the authors used PPO algorithm [16].</p>
        <p>IEEE-ICCE-RL-JSP was published in [11], the
code was obtained from the GitHub repository [19].
It represents JSSP as a heterogeneous graph. In each
step, this model chooses one of the traditional PDRs to
determine the operation to dispatch. To train the model,
the authors used a Double Deep-Q Network (DDQN)
algorithm [20].</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. FJSP Models</title>
        <p>End-to-end-DRL-for-FJSP was published in [14], the
source code was obtained from [21]. It represents FJSP as
a disjunctive graph. It uses a similar "adding arc scheme"
as L2D, which is shown in Figure 5 for an FJSP. To train
the agent, the authors used a multi-agent version of the
PPO algorithm.</p>
        <p>The model fjsp-drl was published in [10], and the source
code was obtained from [22]. It represents FJSP as a
heterogeneous graph. To train the model, the authors
used their own modified version of PPO algorithm.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Proposed Extensions</title>
      <p>In this section, we will present our extensions of JSSP
models to DJSP, namely for L2D, Wheatley, and
IEEEICCE-RL-JSP.</p>
      <sec id="sec-4-1">
        <title>4.1. Dynamic L2D</title>
        <p>The main idea behind our algorithm is that when a new
job new arrives at the time , the algorithm reschedules
all operations that have not started yet. The complete
algorithm pseudocode is shown in Algorithm 1. We treat
L2D as a black box, which takes a JSSP instance and a
list of actions as input (containing the already started
operations), performs actions from the list in a given
order, dispatches the remaining operations using its GNN,
and outputs a solution. The already started operations,
and their start times, are kept in a list called plan. After
each new job arrival, plan is updated with operations
that have started since the previous job arrived, the new
job is added to the instance, and L2D is used to dispatch
the rest of the operations.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Dynamic Wheatley</title>
        <p>We consulted Wheatley’s authors in the GitHub issue
[23]. We were advised to add a "virtual" operation at the
start of the job processed by a "virtual" machine with the
processing time equal to the arrival time . This "virtual"
operation guarantees, that the first "real" operation will
start after . To avoid diferent jobs interfering with
Algorithm 1 Dynamic L2D</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experiment</title>
      <p>We ran all experiments on a MacBook Pro with an Apple
M2 Max chip, 32 GB of RAM, and Sonoma 14.4.1 macOS
operating system.</p>
      <p>IEEE-ICCE-RL-JSP. We used the provided
train5.1.1. Instances ing script, which trains the model until the gap is smaller
JSSP. We obtained 242 benchmark instances, and their than 20%. We trained five checkpoints. We trained the
best-observed solutions from [24]. We reformulate model using CPU on an Arm-based Ampere A1 virtual
each JSSP instance as its equivalent FJSP instance with machine with 4 CPUs and 24 GB of RAM in Oracle Cloud
|ℳ | = 1, so FJSP models can process JSSP instances. Infrastructure with Ubuntu 20.04 64-bit operating system.
For each model and JSSP instance, we run the experiment
with 10 diferent seeds. We are interested in the gap given End-to-end-DRL-for-FJSP. We obtained only
by the following equation one checkpoint from the GitHub repository [21].
 =  − best ,
best
(1)
jfsp-drl . We obtained five checkpoints from the
GitHub repository for this model [22].
5.2. Results
5.2.1. JSSP
∆ avg =   , (2)</p>
      <p>where   is the average processing time of all operations,
and  is a load factor of the dynamic job shop. We use
 ∈ {1, 4} in the experiment.</p>
      <p>We generate the DJSP instances from the JSSP
instances and consider half the jobs as known and the
rest as arriving jobs. We repeat each experiment with
10 diferent seeds. We are interested in the makespan
because the gap is not available.</p>
      <p>Average gaps are shown in Table 1. The boxplot of JSSP
gaps is shown in Figure 6. Average runtimes are in
Table 2.</p>
      <p>We rejected the null hypothesis that the medians of the
performance of the three best models (MWKR, fjsp-drl ,
and IEEE-ICCE-RL-JSP) are equal using the
KruskalWallist test [27] ( &lt; 5%). Therefore we can also reject
the null hypothesis that the medians of the performance
5.1.2. Model checkpoints of all models are equal.</p>
      <p>We compared all models pairwise using the
MannL2D. From the L2D GitHub repository [15], we obtained Whitney U test [28] corrected for multiple hypotheses
checkpoints trained on random instances with size 6x6, testing using the Holm method [29]. The best three
mod10x10, 15x15, 20x15, 20x20, 30x15, 30x20. els were significantly diferent from the rest. The
pairwise comparison of the three best models is shown in
where  is the makespan produced by the model, and
best is the best-observed makespan.</p>
      <p>FJSP. We obtained 402 benchmark instances, and
their best-observed solutions from [25]. Again, we
repeat the experiment for each model and each FJSP
instance with 10 diferent seeds and calculate the gap.</p>
      <p>DJSP. We designed an experiment inspired by
[26]. We assume a set of known jobs at the beginning.</p>
      <p>We then model the arrival of new jobs as a Poisson
process, i.e., the arrival of two consecutive jobs follows
an exponential distribution [26], where the average
arrival time is</p>
      <sec id="sec-5-1">
        <title>5.1.3. Baselines</title>
        <p>JSSP. We compared the models with the 9 classic PDRs
listed in Section 2.5. We used implementations given by
the IEEE-ICCE-RL-JSP model as obtained from the
GitHub repository [19].</p>
        <p>FJSP. For FJSP, we used PDR implementations
from the code of the model End-to-end-DRL-for-FJSP.
For operation selection, we used FIFO, MOPNR, LWKR,
and MWKR. For machine selection, we used EET and
SPT.</p>
        <p>EDD FIFO LOR LPT MWKR LS MOR ESnPdT-to-enSdR-PDTRL-for-FJSP fIjEsEpE-d-IrClCE-RL-JSP l2dwheatley
Average FJSP gaps for diferent models are shown in
Table 4. Average runtimes are shown in Table 5. A boxplot
for FJSP gaps is shown in Figure 7 with logarithmic
vertical scale.</p>
        <p>We rejected the null hypothesis that the medians of
gaps of all models and PDRs are equal using the
KruskalWallis test ( &lt; 5%). We also compared models and
PDRs using the Holm-corrected Mann-Whitney U test
[28]. Almost all pairwise comparisons produced
statistically significant diferences (  &lt; 5%). Comparisons
that did not produce statistically significant diferences
are MOPNR-SPT vs. MWKR-EET, MOPNR-SPT vs.
MWKR-SPT, and MWKR-EET vs. MWKR-SPT.</p>
        <p>MEWndK-Rto-S-ePnTd-DRL-for-FJSP
fjsp-drl</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion</title>
      <p>6.1. JSSP
The results for MWKR we obtained are much better than
the results reported by the authors of L2D [8]. We did
not find their implementation of MWKR, so we could
not investigate the diference.
Average DJSP makespan with  = 1; the lowest values are
bold; "**" denotes best model better than the other two
models, "*" denotes best model better only than the last model or
the second best model better than the last model</p>
      <p>Good performance of MWKR hints at the importance
of the sum of processing times of remaining operations
as an operation feature. The model IEEE-ICCE-RL-JSP
used the number of remaining operations in the current
job as an operation feature. The model fjsp-drl also
included the number of unscheduled operations in the
job as a feature. Other models didn’t use this feature.
6.2. FJSP
The machine selection PDR SPT consistently performs
significantly better than EET. The model
End-to-endDRL-for-FJSP includes processing time as a feature
ex6.3. DJSP
The worse results yielded by Wheatley for all load
factors can be a consequence of the poor performance in
solving static JSSP.</p>
      <p>The result of this experiment indicates that for more
sparsely arriving jobs, L2D features (the lower bound
of the estimated completion time) are more important.
As the load factor increases, a DJSP is more similar to a
static JSSP, and other features, which are more important
for a JSSP, become more important, too.
plicitly in the machine embeddings. The model fjsp-drl
includes the time when the machine will finish all its
currently assigned operations as a feature which is similar to
the machine selection PDR EET. This may explain why
the model fjsp-drl yielded worse results.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion and future work</title>
      <p>In this paper, we compared five job scheduling models
with available source code. We extended three of those
models to the dynamic variant of job-shop scheduling.
From our experiments, we observed that the selection
of input features had a much more significant impact on
model performance than the architecture of the model. In
JSSP, the remaining work seemed to be the most crucial.
In FJSP, it was processing time during machine selection.
In DJSP, the lower bound of the estimated completion
time seemed more influential for sparsely incoming jobs.
For more densely incoming jobs, the remaining work
seemed to be more relevant. Investigating the efect of
existing input features and searching for new ones may
be a valuable direction for future work.
CoRR abs/1707.06347 (2017). URL: http://arxiv.org/
abs/1707.06347. arXiv:1707.06347.
[17] G. Infantes, S. Roussel, P. Pereira, A. Jacquet, E.
Benazera, Learning to solve job shop scheduling
under uncertainty, in: B. Dilkina (Ed.), CPAIOR 2024:
Integration of Constraint Programming, Artificial
Intelligence and Operations Research (20th
International Conference), Springer Nature, 2024.
[18] jolibrain, wheatley, https://github.com/jolibrain/
wheatley/, 2024. Accessed 2024-01-09.
[19] Jerry-Github-Cloud, Ieee-icce-rl-jsp, https://github.
com/Jerry-Github-Cloud/IEEE-ICCE-RL-JSP, 2023.</p>
      <p>Accessed 2024-03-17.
[20] H. van Hasselt, A. Guez, D. Silver, Deep
reinforcement learning with double q-learning,
Proceedings of the AAAI Conference on Artificial
Intelligence 30 (2016). URL: https://ojs.aaai.org/index.
php/AAAI/article/view/10295. doi:10.1609/aaai.
v30i1.10295.
[21] L. Kun, End-to-end-drl-for-fjsp, https://github.com/
Lei-Kun/End-to-end-DRL-for-FJSP, 2022. Accessed
2024-03-17.
[22] W. Song, fjsp-drl, https://github.com/songwenas12/
fjsp-drl, 2023. Accessed 2024-03-22.
[23] jolibrain, wheatley - diferent arrival times
feature request, https://github.com/jolibrain/wheatley/
issues/89, 2024. Accessed 2024-01-09.
[24] Job shop instances and solutions, http://jobshop.</p>
      <p>jjvh.nl/index.php, 2024. Accessed: 2024-03-28.
[25] D. Behnke, M. J. Geiger, Test instances for the
lfexible job shop scheduling problem with work
centers (2012). doi:10.24405/436.
[26] K. Lei, P. Guo, Y. Wang, J. Zhang, X. Meng, L. Qian,
Large-scale dynamic scheduling for flexible
jobshop with random arrivals of new jobs by
hierarchical reinforcement learning, IEEE
Transactions on Industrial Informatics 20 (2024) 1007–1018.
doi:10.1109/TII.2023.3272661.
[27] W. H. Kruskal, W. A. Wallis, Use of ranks in
onecriterion variance analysis, Journal of the American
Statistical Association 47 (1952) 583–621. doi:10.
1080/01621459.1952.10483441.
[28] H. B. Mann, D. R. Whitney, On a test of whether
one of two random variables is stochastically larger
than the other, The annals of mathematical
statistics (1947) 50–60.
[29] S. Garcia, F. Herrera, An extension on" statistical
comparisons of classifiers over multiple data sets"
for all pairwise comparisons., Journal of machine
learning research 9 (2008).</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>