-

Graph Neural Networks and Deep Reinforcement Learning in Job Scheduling⋆

Maroš Bratko

Thomas Seidelmann

Martin Holeňa

0 1 2 0 Charles University in Prague 1 Czech Academy of Sciences 2 Czech Technical University in Prague 3 Otto von Guericke University of Magdeburg

Priority Dispatching Rules (PDRs) are greedy heuristic algorithms used to obtain approximate solutions to the NP-hard Job-shop Scheduling Problem (JSSP). The manual design of PDRs requires domain knowledge to achieve good performance, which varies with scenario properties and objectives. Recently, deep reinforcement learning has been used to automate the process of designing PDRs, where PDRs are formulated as a Markov Decision Process exploiting graph representation of JSSP, and a graph neural network (GNN) selects the operations to be dispatched. We experimentally compare five published models with source code publicly available on GitHub and our extensions of those models on three diferent variants of JSSP. Our experiments show that the choice of input features significantly afects the model's performance regardless of the GNN architecture. This suggests that the feature selection is essential for learning high-quality PDRs.

eol>job-shop scheduling graph neural networks deep reinforcement learning markov decision process

applying graph neural networks (GNNs) on a graph representation of job scheduling problems.

1. Introduction

Job scheduling considers the allocation of jobs to resources with the goal of minimizing the makespan. Each This article considers five models published in literajob consists of a sequence of operations, and each re- ture with source code publicly available on GitHub. Three source can process only one operation at a time [1]. of those models solve JSSP, and two of them solve FJSP.

Generally, the Job-shop Scheduling Problem (JSSP) as- We present our extensions of JSSP models to DJSP and exsumes all jobs to be known apriori and each operation perimentally compare their performance on public benchcan be allocated only to one given machine [1]. In a Flex- marks for JSSP and FJSP. To compare our DJSP extensions, ible Job-shop Scheduling Problem (FJSP), each operation we model the DJSP as a Poisson process and generate can be allocated to any of the machines from a given test instances from JSSP benchmarks. subset of machines [2]. A Dynamic Job-shop Scheduling Problem (DJSP), one of the more common variants, tackles the stochastic aspects of modern manufacturing, 2. Problem formulation e.g., the arrival of new jobs during the execution of the schedule or uncertain processing times [3]. The Job-shop Scheduling Problem (JSSP) consists of a set

Due to the NP-hardness of these problems [4], numer- of jobs and a set of machines ℳ [1]. Each job has an ous approaches and heuristics have been used over time associated sequence of operations ∈ , which must to yield approximate solutions, e.g. genetic algorithms be processed in the given order. Operation represents [5]. uninterrupted processing of job ∈ on machine

Priority Dispatching Rules (PDRs) [6] are a heuristic ∈ ℳ with processing time . Each machine can method widely used in scheduling systems. Designing process only one operation at a time. A schedule is a set a high quality PDR is usually a very time-consuming of start times for each operation that satisfies task requiring extensive domain knowledge. Deep these constraints. The completion times = + reinforcement learning (DRL) has already been proposed denote the end of each operation. The JSSP solution as a possible solution for automatizing algorithm is a schedule minimizing the total makespan max = learning [7]. Several recent works have focused on max, { } [8]. extending this technique to job scheduling [8, 9, 10, 11], Computational Intelligence and Data Mining - 11th international workshop

© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutRion W4.0oInrtekrnsahtioonpal (PCCroBYce4.0e).dings (CEUR-WS.org)

2.1. Flexible Job-shop Scheduling Problem

A flexible Job-shop Scheduling Problem (FJSP) is an extended version of JSSP with the only diference being that each operation ∈ can be processed on any machine from a given subset of machines ℳ ⊆ ℳ with processing time [10]. Solving FJSP then consists of selecting the appropriate machine for each operation in addition to determining its start time. a way that the resulting graph is a directed acyclic graph (DAG) [1].

2.3.1. FJSP as a disjunctive graph

The only diference with respect to JSSP is that each operation can be part of multiple cliques. An example of disjunctive graph representation for FJSP is shown in Figure 2.

2.4. Heterogeneous Graph Representation

JSSP as a heterogenous graph is = (, , ), where is a set of operation nodes, is a set of machines 2.2. Dynamic Job-shop Scheduling nodes, and is a set of edges [11].

Problem edgEea,cmheadchgiencea-ntob-emeaicthheinreop(era− tion-t)oe-dogpee,raotrioonpe(ra− tion)A dynamic Job-shop Scheduling Problem (DJSP) is a dy- to-machine ( − ) edge. − edges fully connect namic version of JSSP, where the jobs are not released all all operations in the same job, and all machines are fully at once, but at diferent times throughout the execution. connected via − edges. − edge connects opWe assume that jobs are known at the beginning of the erations with machines on which they can be processed. schedule and ′ jobs arrive after the start of the schedule [12, 13]. 2.4.1. FJSP as a Heterogeneous Graph

2.3. Disjunctive Graph Representation

JSSP can be represented as a disjunctive graph = (, , ), where denotes a set of vertices corresponding to diferent operations weighted by the processing time together with a start node and a target node with processing time equal to zero, representing start and end of the schedule, respectively [1]. is a set of arcs representing precedence constraints. = ⋃︀ is a set of edges, where is a clique connecting operations that require the same machine for their execution. An example of a JSSP instance represented as a disjunctive graph is shown in Figure 1.

Finding a solution to the Job-shop Scheduling Problem can be viewed as denfiing the ordering between operations requiring the same machine. In the disjunctive graph, this is done by turning all edges into arcs in such FJSP as a heterogenous graph is defined as = (, , , Σ) [10]. Set of operation nodes and set of arcs is the same as in the disjunctive graph. A set of machine nodes represents machines, and a set of edges Σ connects operation nodes and machine nodes on which they can be processed. An example of a heterogeneous graph for FJSP is shown in Figure 3.

2.5. Priority Dispatching Rules

PDRs are a greedy heuristic method for solving JSSP in || steps [8]. For each eligible operation, PDR computes a priority index and selects the one with the highest priority to be dispatched. In FJSP, PDR also selects the machine. Traditional PDRs compute the priority index based on the set of features for each operation [13]. Traditional PDRs include: Decisions made by PDRs can then be viewed as actions changing the disjunctive graph. This process can then be formulated as a Markov Decision Process (MDP) [8], allowing PDRs to be learned automatically via deep reinforcement learning techniques. State at timestep is a graph as described in Section 2.3 and Section 2.4. Action at timestep is an unscheduled operation whose preceding operations have already been scheduled. A state transition from to +1 after executing is done by updating the graph. The reward for each action is the diference between the max() and max(+1), where max() is the lower bound of the makespan in state . The cumulative reward with discount factor = 1 is max(0)− max(||), where max(||) corresponds to the makespan of the final schedule max, and max(0) is a constant specific to the problem instance. Maximizing cumulative reward minimizes final schedule makespan max [8, 10, 11]. Each model presented in the next section uses their own modified version of this MDP, which we will not discuss in detail.

3. Models with source code

In this section, we will briefly present five models from the literature with available source code. These models used GNNs and DRL to solve JSSP and FJSP. We named each model after its GitHub repository. Source code of all models, experiments, and data used in this paper can be found on GitHub (link).

3.1. JSSP Models

L2D is a model published in [8], source code was obtained from [15]. It employs a modified disjunctive graph shown in Figure 4, where the authors start with the original disjunctive graph containing only conjunctive arcs, and after each dispatched operation, they add a conjunctive arc representing a new precedence constraint between operations on the same machine. To train the model, the authors used Proximal Policy Optimization (PPO) algorithm [16]. Wheatley is an open-source model published in [17], with source code available on GitHub [18]. It uses a similar "arc adding scheme" as L2D. To train the model, the authors used PPO algorithm [16].

IEEE-ICCE-RL-JSP was published in [11], the code was obtained from the GitHub repository [19]. It represents JSSP as a heterogeneous graph. In each step, this model chooses one of the traditional PDRs to determine the operation to dispatch. To train the model, the authors used a Double Deep-Q Network (DDQN) algorithm [20].

3.2. FJSP Models

End-to-end-DRL-for-FJSP was published in [14], the source code was obtained from [21]. It represents FJSP as a disjunctive graph. It uses a similar "adding arc scheme" as L2D, which is shown in Figure 5 for an FJSP. To train the agent, the authors used a multi-agent version of the PPO algorithm.

The model fjsp-drl was published in [10], and the source code was obtained from [22]. It represents FJSP as a heterogeneous graph. To train the model, the authors used their own modified version of PPO algorithm.

4. Proposed Extensions

In this section, we will present our extensions of JSSP models to DJSP, namely for L2D, Wheatley, and IEEEICCE-RL-JSP.

4.1. Dynamic L2D

The main idea behind our algorithm is that when a new job new arrives at the time , the algorithm reschedules all operations that have not started yet. The complete algorithm pseudocode is shown in Algorithm 1. We treat L2D as a black box, which takes a JSSP instance and a list of actions as input (containing the already started operations), performs actions from the list in a given order, dispatches the remaining operations using its GNN, and outputs a solution. The already started operations, and their start times, are kept in a list called plan. After each new job arrival, plan is updated with operations that have started since the previous job arrived, the new job is added to the instance, and L2D is used to dispatch the rest of the operations.

4.2. Dynamic Wheatley

We consulted Wheatley’s authors in the GitHub issue [23]. We were advised to add a "virtual" operation at the start of the job processed by a "virtual" machine with the processing time equal to the arrival time . This "virtual" operation guarantees, that the first "real" operation will start after . To avoid diferent jobs interfering with Algorithm 1 Dynamic L2D

5. Experiment

We ran all experiments on a MacBook Pro with an Apple M2 Max chip, 32 GB of RAM, and Sonoma 14.4.1 macOS operating system.

IEEE-ICCE-RL-JSP. We used the provided train5.1.1. Instances ing script, which trains the model until the gap is smaller JSSP. We obtained 242 benchmark instances, and their than 20%. We trained five checkpoints. We trained the best-observed solutions from [24]. We reformulate model using CPU on an Arm-based Ampere A1 virtual each JSSP instance as its equivalent FJSP instance with machine with 4 CPUs and 24 GB of RAM in Oracle Cloud |ℳ | = 1, so FJSP models can process JSSP instances. Infrastructure with Ubuntu 20.04 64-bit operating system. For each model and JSSP instance, we run the experiment with 10 diferent seeds. We are interested in the gap given End-to-end-DRL-for-FJSP. We obtained only by the following equation one checkpoint from the GitHub repository [21]. = − best , best (1) jfsp-drl . We obtained five checkpoints from the GitHub repository for this model [22]. 5.2. Results 5.2.1. JSSP ∆ avg = , (2)

where is the average processing time of all operations, and is a load factor of the dynamic job shop. We use ∈ {1, 4} in the experiment.

We generate the DJSP instances from the JSSP instances and consider half the jobs as known and the rest as arriving jobs. We repeat each experiment with 10 diferent seeds. We are interested in the makespan because the gap is not available.

Average gaps are shown in Table 1. The boxplot of JSSP gaps is shown in Figure 6. Average runtimes are in Table 2.

We rejected the null hypothesis that the medians of the performance of the three best models (MWKR, fjsp-drl , and IEEE-ICCE-RL-JSP) are equal using the KruskalWallist test [27] ( < 5%). Therefore we can also reject the null hypothesis that the medians of the performance 5.1.2. Model checkpoints of all models are equal.

We compared all models pairwise using the MannL2D. From the L2D GitHub repository [15], we obtained Whitney U test [28] corrected for multiple hypotheses checkpoints trained on random instances with size 6x6, testing using the Holm method [29]. The best three mod10x10, 15x15, 20x15, 20x20, 30x15, 30x20. els were significantly diferent from the rest. The pairwise comparison of the three best models is shown in where is the makespan produced by the model, and best is the best-observed makespan.

FJSP. We obtained 402 benchmark instances, and their best-observed solutions from [25]. Again, we repeat the experiment for each model and each FJSP instance with 10 diferent seeds and calculate the gap.

DJSP. We designed an experiment inspired by [26]. We assume a set of known jobs at the beginning.

We then model the arrival of new jobs as a Poisson process, i.e., the arrival of two consecutive jobs follows an exponential distribution [26], where the average arrival time is

5.1.3. Baselines

JSSP. We compared the models with the 9 classic PDRs listed in Section 2.5. We used implementations given by the IEEE-ICCE-RL-JSP model as obtained from the GitHub repository [19].

FJSP. For FJSP, we used PDR implementations from the code of the model End-to-end-DRL-for-FJSP. For operation selection, we used FIFO, MOPNR, LWKR, and MWKR. For machine selection, we used EET and SPT.

EDD FIFO LOR LPT MWKR LS MOR ESnPdT-to-enSdR-PDTRL-for-FJSP fIjEsEpE-d-IrClCE-RL-JSP l2dwheatley Average FJSP gaps for diferent models are shown in Table 4. Average runtimes are shown in Table 5. A boxplot for FJSP gaps is shown in Figure 7 with logarithmic vertical scale.

We rejected the null hypothesis that the medians of gaps of all models and PDRs are equal using the KruskalWallis test ( < 5%). We also compared models and PDRs using the Holm-corrected Mann-Whitney U test [28]. Almost all pairwise comparisons produced statistically significant diferences ( < 5%). Comparisons that did not produce statistically significant diferences are MOPNR-SPT vs. MWKR-EET, MOPNR-SPT vs. MWKR-SPT, and MWKR-EET vs. MWKR-SPT.

MEWndK-Rto-S-ePnTd-DRL-for-FJSP fjsp-drl

6. Discussion

6.1. JSSP The results for MWKR we obtained are much better than the results reported by the authors of L2D [8]. We did not find their implementation of MWKR, so we could not investigate the diference. Average DJSP makespan with = 1; the lowest values are bold; "**" denotes best model better than the other two models, "*" denotes best model better only than the last model or the second best model better than the last model

Good performance of MWKR hints at the importance of the sum of processing times of remaining operations as an operation feature. The model IEEE-ICCE-RL-JSP used the number of remaining operations in the current job as an operation feature. The model fjsp-drl also included the number of unscheduled operations in the job as a feature. Other models didn’t use this feature. 6.2. FJSP The machine selection PDR SPT consistently performs significantly better than EET. The model End-to-endDRL-for-FJSP includes processing time as a feature ex6.3. DJSP The worse results yielded by Wheatley for all load factors can be a consequence of the poor performance in solving static JSSP.

The result of this experiment indicates that for more sparsely arriving jobs, L2D features (the lower bound of the estimated completion time) are more important. As the load factor increases, a DJSP is more similar to a static JSSP, and other features, which are more important for a JSSP, become more important, too. plicitly in the machine embeddings. The model fjsp-drl includes the time when the machine will finish all its currently assigned operations as a feature which is similar to the machine selection PDR EET. This may explain why the model fjsp-drl yielded worse results.

7. Conclusion and future work

In this paper, we compared five job scheduling models with available source code. We extended three of those models to the dynamic variant of job-shop scheduling. From our experiments, we observed that the selection of input features had a much more significant impact on model performance than the architecture of the model. In JSSP, the remaining work seemed to be the most crucial. In FJSP, it was processing time during machine selection. In DJSP, the lower bound of the estimated completion time seemed more influential for sparsely incoming jobs. For more densely incoming jobs, the remaining work seemed to be more relevant. Investigating the efect of existing input features and searching for new ones may be a valuable direction for future work. CoRR abs/1707.06347 (2017). URL: http://arxiv.org/ abs/1707.06347. arXiv:1707.06347. [17] G. Infantes, S. Roussel, P. Pereira, A. Jacquet, E. Benazera, Learning to solve job shop scheduling under uncertainty, in: B. Dilkina (Ed.), CPAIOR 2024: Integration of Constraint Programming, Artificial Intelligence and Operations Research (20th International Conference), Springer Nature, 2024. [18] jolibrain, wheatley, https://github.com/jolibrain/ wheatley/, 2024. Accessed 2024-01-09. [19] Jerry-Github-Cloud, Ieee-icce-rl-jsp, https://github. com/Jerry-Github-Cloud/IEEE-ICCE-RL-JSP, 2023.

Accessed 2024-03-17. [20] H. van Hasselt, A. Guez, D. Silver, Deep reinforcement learning with double q-learning, Proceedings of the AAAI Conference on Artificial Intelligence 30 (2016). URL: https://ojs.aaai.org/index. php/AAAI/article/view/10295. doi:10.1609/aaai. v30i1.10295. [21] L. Kun, End-to-end-drl-for-fjsp, https://github.com/ Lei-Kun/End-to-end-DRL-for-FJSP, 2022. Accessed 2024-03-17. [22] W. Song, fjsp-drl, https://github.com/songwenas12/ fjsp-drl, 2023. Accessed 2024-03-22. [23] jolibrain, wheatley - diferent arrival times feature request, https://github.com/jolibrain/wheatley/ issues/89, 2024. Accessed 2024-01-09. [24] Job shop instances and solutions, http://jobshop.

jjvh.nl/index.php, 2024. Accessed: 2024-03-28. [25] D. Behnke, M. J. Geiger, Test instances for the lfexible job shop scheduling problem with work centers (2012). doi:10.24405/436. [26] K. Lei, P. Guo, Y. Wang, J. Zhang, X. Meng, L. Qian, Large-scale dynamic scheduling for flexible jobshop with random arrivals of new jobs by hierarchical reinforcement learning, IEEE Transactions on Industrial Informatics 20 (2024) 1007–1018. doi:10.1109/TII.2023.3272661. [27] W. H. Kruskal, W. A. Wallis, Use of ranks in onecriterion variance analysis, Journal of the American Statistical Association 47 (1952) 583–621. doi:10. 1080/01621459.1952.10483441. [28] H. B. Mann, D. R. Whitney, On a test of whether one of two random variables is stochastically larger than the other, The annals of mathematical statistics (1947) 50–60. [29] S. Garcia, F. Herrera, An extension on" statistical comparisons of classifiers over multiple data sets" for all pairwise comparisons., Journal of machine learning research 9 (2008).