<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>R. Lo Bianco);</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>GymPN: a Python Library for Automated Decision-Making in Process Management Systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Riccardo Lo Bianco</string-name>
          <email>r.lo.bianco@tue.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Willem van Jaarsveld</string-name>
          <email>w.l.v.jaarsveld@tue.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Remco Dijkman</string-name>
          <email>r.m.dijkman@tue.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Petri Net, Deep Reinforcement Learning, Optimization</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Eindhoven University of Technology</institution>
          ,
          <addr-line>Eindhoven</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>GymPN is a Python library for modeling and solving business process decision-making problems using Deep Reinforcement Learning (DRL). Built on top of SimPN, it integrates Petri Net-based simulation with DRL training pipelines, enabling users to define, train, and evaluate decision policies with minimal configuration. GymPN supports both heuristic and learning-based approaches, and includes features for modeling partially observable processes as well as decisions at diferent steps of the process. The library is open-source, easy to use, and has been validated in a wide range of business process scenarios.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Petri Nets (PNs) represent an established mathematical formulation for the simulation of business
processes [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In recent years, an extension of the PN formalism, namely, Action-Evolution Petri
Nets (A-E PNs), was proposed to enable modeling and solving optimization problems in business
processes [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. As an example, let us consider the problem in Figure 1, which represents a simple
dynamic task assignment problem.
      </p>
      <p>Resources</p>
      <p>Arrival
X</p>
      <p>E
Arrive</p>
      <p>X</p>
      <p>Waiting</p>
      <p>X</p>
      <p>A
Start
(X,Y)@+tf(X,Y)</p>
      <p>(X,Y) E
Busy</p>
      <p>Complete
Reward Function: r = 1</p>
      <p>Two cases, each composed of a single task, enter the system at every time unit. Two resources are
present, with diferent proficiencies on each task type. This is modeled through the
tf function so that
resources whose resource_id is the same as the task_id they are assigned to take 1 time unit to complete
the task, 2 time units otherwise. A reward of 1 is produced every time a case is completed. Thus the
objective is to minimize the cycle time of cases (equivalent to maximizing the reward). The goal of the
optimization is to find a policy function that maps each observation (a given marking of the PN) to a
combination of task and resource to maximize the reward over an episode.</p>
      <p>The A-E PN formalism was conceived to combine elements of simulation with elements of
sequential decision-making, with a particular focus on Deep Reinforcement Learning (DRL). However, the
theoretical foundations of A-E PN have not yet been implemented in a well-structured, easy-to-use</p>
      <p>CEUR</p>
      <p>
        ceur-ws.org
software package. GymPN provides such a tool in the form of a Python package. Based on the recently
published SimPN library [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], GymPN maintains the same design principles as SimPN, integrating all
elements of A-E PNs in a seamless fashion. This tight integration allows for using most of SimPN’s
features, such as reporting, support for higher-level modeling languages, and visualizations. At the
same time, it greatly simplifies the adoption for SimPN users.
      </p>
      <p>The remainder of this work is structured as follows: section 2 discusses alternative optimization
packages, presenting the advantages that GymPN ofers; section 3 presents the modeling, simulation
and optimization features of the library; section 4 discusses the maturity of the library.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <p>
        Several DRL-based approaches for automated decision-making in process management systems have
been proposed [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ]. However, all these approaches are tailored to a single problem instance, and they
do not ofer utilities to model new problems. By contrast, GymPN integrates the theory of A-E PN [
        <xref ref-type="bibr" rid="ref2 ref6">2, 6</xref>
        ]
with the formal syntax of SimPN [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Moreover, GymPN extends the capabilities of A-E PN, adding
support for partial observability and multi-decision processes [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        GymPN can be considered a software package for sequential decision-making. However, compared
to other publicly available Python packages, like Gymnasium [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] (internally used in GymPN) or
OR-Gym [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], it ofers unique utilities to model the desired problem using business process notation.
Moreover, it does not require the user to define optimization-specific elements (except for the objective),
allowing one to focus on the problem definition instead.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Features</title>
      <p>GymPN uses the same syntax as SimPN, integrating the necessary elements to define a business process
decision-making problem, i.e., what decisions need to be made (action transitions), and a measure
of goodness of such decisions (rewards). To display GymPN’s capabilities, we model a simple task
assignment problem. We then proceed by showing how to train a DRL agent and compare it with a
heuristic policy.</p>
      <sec id="sec-3-1">
        <title>3.1. Modeling</title>
        <p>In this section, we implement the problem presented in Figure 1 in GymPN.
agency = GymProblem()
arrival = agency.add_var("arrival", var_attributes=['task_type'])
waiting = agency.add_var("waiting", var_attributes=['task_type'])
busy = agency.add_var("busy", var_attributes=['task_type', 'resource_id'])
arrival.put({'task_type': 0})
arrival.put({'task_type': 1})
resources = agency.add_var("resources", var_attributes=['resource_id'])
resources.put({'resource_id': 0})
resources.put({'resource_id': 1})
def arrive(a):</p>
        <p>return [SimToken(a, delay=1), SimToken(a)]
agency.add_event([arrival], [arrival, waiting], arrive)
def start(c, r):
if c['task_type'] == r['resource_id']:</p>
        <p>return [SimToken((c, r), delay=1)]
else:</p>
        <p>return [SimToken((c, r), delay=2)]
agency.add_action([waiting, resources], [busy], behavior=start, name="start")
def complete(b):</p>
        <p>
          return [SimToken(b[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ])]
agency.add_event([busy], [resources], complete, name='complete', reward_function=lambda x: 1)
agency.training_run(length=10, args_dict=default_args)
        </p>
        <p>The main class in GymPN is GymProblem (an extension of SimPN’s SimProblem), which can be seen as
an initially empty Petri Net. Places are added to the PN using the add_var method. The initial marking is
defined by inserting tokens into the places through the put function. GymPN assumes that the attributes
of the tokens are expressed as dictionaries, and it is necessary to specify the attributes of the tokens in
every place via the var_attributes parameter. This is necessary to ensure that the observations for
DRL are derived correctly from the marking. Evolution transitions (the classical non-deterministic
transitions of Timed Colored Petri Nets), such as the arrival transition, are added to the problem via
the add_event function, the same way as they are handled in SimPN. Action transitions (transitions
for which a policy chooses the tokens to be used for firing) are defined in a similar fashion via the
add_action function. For any transition, it is possible to specify a reward function that is called every
time the transition is fired. In the example, only transition done produces a reward of 1. This means
that, over 10 time units, the maximum cumulative reward is 20.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Training Policies</title>
        <p>To train a DRL policy on the provided example, it is suficient to run the training_run function:
agency.training_run(length=10)</p>
        <p>The training_run function runs a given number of training epochs (by default 100) on the Proximal
Policy Optimization (PPO) algorithm. The default hyperparameters and other configuration variables
can be modified by passing a dictionary to the args_dict parameter in the training_run function.
By default, running the training_run function will open a TensorBoard dashboard that displays the
advancement of the training process. In Figure 2, we report two representative graphs from the
dashboard.
(a) A decreasing policy loss indicates that the
learning is progressing.
(b) The (deterministic) policy is tested during
training, showing convergence.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Testing Policies</title>
        <p>The GymSolver class is responsible for mapping a PN marking to a suitable observation and returning
an action (i.e., what tokens to use to fire an action transition) according to a policy function. Having
defined the solver, the testing_run function is used to simulate an episode using the desired DRL policy,
by retrieving the weights of the trained neural network.
solver = GymSolver(weights_path='./weights.pth', metadata=agency.make_metadata())
res = agency.testing_run(length=10, solver=solver)
In the example, the DRL policy produces a reward of 20 over 10 time units, which is the maximum. This
is in line with the results observed during the training phase, as reported in Figure 2b.</p>
        <p>
          A specialized solver, namely, HeuristicSolver, is available to implement heuristic policies. In the
presented example, we can develop a simple heuristic policy that always assigns the resource with a
given resource_id to a task with the same task_type. The heuristic policy function takes two parameters
representing, respectively, the observable parts of the Petri Nets and the list of all possible combinations
of tokens that can be used to fire an action transition.
def perfect_heuristic(observable_net, tokens_comb):
for k, el in tokens_comb.items():
for binding in el:
task = binding[0][
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].value
resource = binding[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ][
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].value
if task['task_type'] == resource['resource_id']:
        </p>
        <p>return {k: binding}
solver = HeuristicSolver(perfect_heuristic)
res = agency.testing_run(length=10, solver=solver)
It is easy to see that this is the optimal policy, and it also produces a reward of 20 over 10 time units.</p>
        <p>Similarly, we can use the RandomSolver class to implement a random assignment policy.
solver = RandomSolver()
res = agency.testing_run(length=10, solver=solver)
The random assignment policy is clearly suboptimal, producing an average reward of 15 over 10 time
units.</p>
        <p>For logging, the testing_run function is compatible with the SimPN Reporter class. Moreover, the
library provides an interactive graphical visualization of the behavior of the policy, as shown in Figure 3.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Tool Maturity</title>
      <p>
        GymPN was made available as part of a publication [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] that introduced support for multiple action types
and partial observability in the A-E PN framework. This publication included a set of eight example
problems that cover basic business process management patterns. In the documentation, more complex
examples and features are discussed, and a method to automatically derive GymPN simulations from
real-world process logs is being developed. At the time of writing, the repository is open-source and
actively in development.
      </p>
      <p>Acknowledgments. This project was developed as part of the AI (Artificial Intelligence) Planner of
The Future research program, an initiative of the European Supply Chain Forum (ESCF).
Declaration on Generative AI. During the preparation of this work, the author(s) used Grammarly
and Microsoft Copilot in order to: Grammar and spelling check. After using these tool(s)/service(s), the
author(s) reviewed and edited the content as needed and take(s) full responsibility for the publication’s
content.</p>
    </sec>
    <sec id="sec-5">
      <title>A. Online Resources</title>
      <p>• GymPN is available on GitHub at the address https://github.com/bpogroup/gympn
• An introductory video is available on YouTube at the address https://youtu.be/2N7DW67NxqI</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>K.</given-names>
            <surname>Jensen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Kristensen</surname>
          </string-name>
          , L. Wells,
          <article-title>Coloured Petri Nets and CPN Tools for modelling and validation of concurrent systems</article-title>
          ,
          <source>International Journal on Software Tools for Technology Transfer</source>
          <volume>9</volume>
          (
          <year>2007</year>
          )
          <fpage>213</fpage>
          -
          <lpage>254</lpage>
          . URL: https://link.springer.com/10.1007/s10009-007-0038-x. doi:
          <volume>10</volume>
          .1007/ s10009- 007- 0038- x.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Lo</surname>
          </string-name>
          <string-name>
            <surname>Bianco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Dijkman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Nuijten</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. Van Jaarsveld</surname>
          </string-name>
          ,
          <article-title>Action-evolution Petri Nets: A Framework for Modeling and Solving Dynamic Task Assignment Problems</article-title>
          , volume
          <volume>14159</volume>
          , Springer Nature Switzerland, Cham,
          <year>2023</year>
          , pp.
          <fpage>216</fpage>
          -
          <lpage>231</lpage>
          . doi:
          <volume>10</volume>
          .1007/978- 3-
          <fpage>031</fpage>
          - 41620- 0_13, series Title: Lecture Notes in Computer Science.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Dijkman</surname>
          </string-name>
          ,
          <article-title>Simpn: A python library for modeling and simulating timed, colored petri nets</article-title>
          , in: BPM-D
          <year>2024</year>
          , CEUR Workshop Proceedings, CEUR-WS.org,
          <year>2024</year>
          , pp.
          <fpage>71</fpage>
          -
          <lpage>75</lpage>
          . Publisher Copyright:
          <article-title>© 2024 Copyright for this paper by its authors</article-title>
          .
          <source>; Best Dissertation Award, Doctoral Consortium, and Demonstration and Resources Forum at 22nd International Conference on Business Process Management</source>
          ,
          <string-name>
            <surname>BPM-D 2024</surname>
          </string-name>
          ; Conference date:
          <fpage>01</fpage>
          -
          <lpage>09</lpage>
          -2024 Through 06-
          <fpage>09</fpage>
          -
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Shyalika</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Silva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Karunananda</surname>
          </string-name>
          ,
          <article-title>Reinforcement learning in Dynamic Task Scheduling: A Review</article-title>
          ,
          <source>SN Computer Science</source>
          <volume>1</volume>
          (
          <year>2020</year>
          ).
          <source>Publisher: Springer Science and Business Media LLC.</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R. Lo</given-names>
            <surname>Bianco</surname>
          </string-name>
          , W. v. Jaarsveld,
          <string-name>
            <given-names>J.</given-names>
            <surname>Middelhuis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Begnardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Dijkman</surname>
          </string-name>
          ,
          <article-title>Automated decision-making for dynamic task assignment at scale</article-title>
          ,
          <year>2025</year>
          . doi:
          <volume>10</volume>
          .48550/arXiv.2504.19933, arXiv:
          <fpage>2504</fpage>
          .19933 [cs].
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Lo</surname>
          </string-name>
          <string-name>
            <surname>Bianco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Dijkman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Nuijten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. Van</given-names>
            <surname>Jaarsveld</surname>
          </string-name>
          ,
          <article-title>A universal Approach to Feature Representation in Dynamic Task Assignment Problems</article-title>
          , in: A.
          <string-name>
            <surname>Marrella</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Resinas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Jans</surname>
          </string-name>
          , M. Rosemann (Eds.),
          <source>Business process Management Forum</source>
          , volume
          <volume>526</volume>
          , Springer Nature Switzerland, Cham,
          <year>2024</year>
          , pp.
          <fpage>197</fpage>
          -
          <lpage>213</lpage>
          . doi:
          <volume>10</volume>
          .1007/978- 3-
          <fpage>031</fpage>
          - 70418- 5_12,
          <string-name>
            <surname>series</surname>
            <given-names>Title</given-names>
          </string-name>
          <source>: Lecture Notes in Business Information Processing.</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R. L.</given-names>
            <surname>Bianco</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. van Jaarsveld</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Dijkman</surname>
          </string-name>
          ,
          <article-title>Gympn: A library for decision-making in process management systems</article-title>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2506.20404. arXiv:
          <volume>2506</volume>
          .
          <fpage>20404</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Towers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kwiatkowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Terry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. U.</given-names>
            <surname>Balis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. D.</given-names>
            <surname>Cola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Deleu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Goulão</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kallinteris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krimmel</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. KG</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Perez-Vicente</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pierré</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schulhof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Tai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. G.</given-names>
            <surname>Younis</surname>
          </string-name>
          ,
          <article-title>Gymnasium: a Standard Interface for Reinforcement Learning Environments,</article-title>
          <year>2024</year>
          . ArXiv:
          <volume>2407</volume>
          .17032 [cs].
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Hubbs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. D.</given-names>
            <surname>Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Sarwar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. V.</given-names>
            <surname>Sahinidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. E.</given-names>
            <surname>Grossmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Wassick</surname>
          </string-name>
          ,
          <string-name>
            <surname>OR-Gym</surname>
          </string-name>
          :
          <article-title>A Reinforcement Learning Library for Operations Research Problems</article-title>
          ,
          <year>2020</year>
          . ArXiv:
          <year>2008</year>
          .06319 [cs].
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>