Multi-Agent Reinforcement Learning tool for
          Job Shop Scheduling Problems

           Jessica Coto Palacio1,2 Yailen Martı́nez Jiménez2 Ann Nowé3
1
  UEB Hotel Los Caneyes, Ave. Los Eucaliptos y Circunvalación, Santa Clara, Cuba
                              jcotopalacio@gmail.com
2
  Universidad Central “ Marta Abreu ” de Las Villas, Santa Clara, Villa Clara, Cuba
                      yailenm@uclv.edu.cu, jcoto@uclv.cu
          3
            Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium
                               ann.nowe@ai.vub.ac.be


Keywords: Job Shop Scheduling · Industry 4.0 · Reinforcement Learning ·
Multi-Agent Systems.


1     Introduction

During the last years, technological developments have increasingly benefited in-
dustry performance. The appearance of new information technologies have given
rise to intelligent factories in what is termed as Industry 4.0 (i4.0) [8, 7]. The
i4.0 revolution involves the combination of intelligent and adaptive systems us-
ing shared knowledge among diverse heterogeneous platforms for computational
decision-making [13, 9]. In this sense, embedding Multi-Agent Systems (MAS) is
a highly promising approach to handle complex and dynamic problems. A typical
example of an industrial opportunity of this kind is scheduling, whose goal is to
achieve resource optimization and minimization of the total execution time [11].
Given the complexity and dynamism of industrial environments, the resolution of
this type of problem may involve the use of very complex solutions, as customer
orders have to be executed on the resources available. In real world scheduling
problems, the environment is subject to constant uncertainty, machines break-
down, orders take longer than expected, and these unexpected events can make
the original schedule fail [15, 6]. Accordingly, the problem of creating a job-shop
scheduling (JSSP)is considered one of the hardest manufacturing problems in
literature [1]. In [3] and [4], the authors suggested and analyzed the application
of reinforcement learning techniques to solve the JSSP. They demonstrated that
interpreting and solving this kind of scenarios as a multi-agent learning prob-
lem is beneficial for obtaining near-optimal solutions and can very well compete
with alternative approaches. Another problem that has been identified in the
scheduling community is the fact that most of the research concentrates on op-
timization problems that are a simplified version of reality. As the author points
    “Copyright c 2019 for this paper by its authors. Use permitted under Creative Com-
    mons License Attribution 4.0 International (CC BY 4.0).”
2       J. Coto Palacio et al.

out in [12]: this allows for the use of sophisticated approaches and guarantees
in many cases that optimal solutions are obtained. The exclusion of real-world
restrictions harms the applicability of those methods. What the industry needs
are systems that adjust exactly to the conditions in the production plant.


2   Multi-Agent Reinforcement Learning Tool
The MARL tool proposed in this research allows the user either to keep the best
result obtained by a learning algorithm or to include extra constraints of the
production floor. This first version allows to fix operations to time intervals in
the corresponding resources and afterwards optimize the solution based on the
new constraints added by the user. This is a first approach that helps to close
the gap between literature and practice. It groups several algorithms aimed at
solving scheduling problems in the manufacturing industry. It focuses on the
need of building a more flexible schedule, in order to adjust it to the user’s
requests without violating the restrictions of the JSSP scenario. The approach
used to obtain the original solution that the user can afterwards modify is the
one proposed in [10], it is a generic multi-agent reinforcement learning approach
that can be easily adapted to different scheduling settings, such as the Flexible
Job Shop (FJSSP)[5] or the Parallel Machines Job Shop Scheduling (PMJSSP)
[14]. The algorithm used is the Q-Learning, which works by learning an action-
value function that gives the expected utility of taking a given action in a given
state. There is basically an agent per machine which takes care of allocating the
operations that must be executed by the corresponding resource. Once the user
chooses the scheduling scenario to solve (JSSP, FJSSP or PMJSSP), the tool
proposes an initial solution based on the QL algorithm, and at the same time
it enables a set of options that are the basis of this research. The user has the
possibility to move the operations either using the mouse or the touch screen,
and these movements must be validated once the new positions are decided.


3   Experimental Results and Conclusions
In this work we compare the performance of two alternatives for optimizing
the schedule once the user has fixed some operations, the classical left shifting
and a modified Q-Learning algorithm, which includes the position of the fixed
operations in the learning process. In order to measure the performance of both
alternatives several benchmark problems from the OR-Library [2] were used. For
each instance, the same operations were fixed, and each optimization alternative
had to adjust the schedule in order to minimize the makespan. The Wilcoxon
test applied to the results shows that there are significant differences between the
two alternatives (sig=0.08), the mean ranks confirm that the QL version with
fixed operations is able to obtain better results than the classical optimization
process. This is mainly because the left shifting respects the order in which the
operations were initially placed along the X-axis. The QL algorithm, on the other
hand, keeps the fixed positions and during the process of learning, the order in
Multi-Agent Reinforcement Learning tool for Job Shop Scheduling Problems               3

which the operations are scheduled in the resources does not have to be the
same, this allows the approach to obtain better solutions in terms of makespan.


References
 1. Asadzadeh, L.: A local search genetic algorithm for the job shop scheduling problem
    with intelligent agents. Computers & Industrial Engineering 85, 376–383 (2015)
 2. Beasley, J.E.: OR-Library: Distributing test problems by electronic mail. Journal
    of the Operational Research Society 41(11), 1069–1072 (1990)
 3. Gabel, T.: Multi-Agent Reinforcement Learning Approaches for Distributed Job-
    Shop Scheduling Problems. Phd thesis, Universität Osnabrück (2009)
 4. Gabel, T., Riedmiller, M.: On a Successful Application of Multi-Agent Reinforce-
    ment Learning to Operations Research Benchmarks. In: IEEE International Sym-
    posium on Approximate Dynamic Programming and Reinforcement Learning. pp.
    68–75. Honolulu , USA (2007)
 5. Gavin, R., Niranjan, M.: On-line Q-learning using connectionist systems. Tech.
    rep., Engineering Department, Cambridge University (1994)
 6. Hall, N., Potts, C.: Rescheduling for new orders. Operations Research 52, 440–453
    (2004)
 7. Leitao, P., Colombo, A., Karnouskos, S.: Industrial automation based on cyber-
    physical systems technologies: Prototype implementations and challenges. Com-
    puters Industry 81, 11–25 (2016)
 8. Leitao, P., Rodrigues, N., Barbosa, J., Turrin, C., Pagani, A.: Intelligent products:
    The grace experience. Control Engineering Practice 42, 95–105 (2005)
 9. Leusin, M.E., Frazzon, E.M., Uriona Maldonado, M., Kück, M., Freitag, M.: Solv-
    ing the Job-Shop Scheduling Problem in the Industry 4.0 Era. Technologies 6(4)
    (2018)
10. Martı́nez Jiménez, Y.: A Generic Multi-Agent Reinforcement Learning Approach
    for Scheduling Problems. Ph.D. thesis, Vrije Universiteit Brussel, Brussels (2012)
11. Toader, F.A.: Production Scheduling in Flexible Manufacturing Systems: A State
    of the Art Survey 3(7), 1–6 (2017)
12. Urlings, T.: Heuristics and metaheuristics for heavily constrained hybrid flowshop
    problems. Ph.D. thesis (2010)
13. Vogel-Heuser, B., Lee, J., Leitao, P.: Agents enabling cyber-physical production
    systems. AT-Autom. 63, 777–789 (2015)
14. Williams, R.J.: Simple statistical gradient-following algorithms for connectionist
    reinforcement learning. Machine Learning 8, 229–256 (1992)
15. Xiang, W., Lee, H.: Ant colony intelligence in multi-agent dynamic manufacturing
    scheduling. Engineering Applications of Artificial Intelligence 21, 73–85 (2008)