<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Neural Networks for Numeric Planning: A Preliminary Study</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Valerio Borelli</string-name>
          <email>valerio.borelli@unibs.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alfonso Emilio Gerevini</string-name>
          <email>alfonso.gerevini@unibs.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Enrico Scala</string-name>
          <email>enrico.scala@unibs.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ivan Serina</string-name>
          <email>ivan.serina@unibs.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Doctoral Consortium at the 23rd International Conference of the Italian Association for Artificial Intelligence Bolzano</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Sapienza University of Rome</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Brescia</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <fpage>25</fpage>
      <lpage>28</lpage>
      <abstract>
        <p>We consider the problem of learning heuristics for numeric planning domains, using Graph Neural Networks. The problem has been approached multiple times, from diferent perspectives and with varying results for classical planning, but is relatively new for numeric planning. The goal is to extend the work proposed by St l̇berg, Bonet, and Gefner [ 1] to handle numeric planning problems. In recent years diferent works have shown how deep learning approaches can be used to solve planning problems, either as heuristic functions for classical planning, as value functions for general policies, or even as a separate system that can find plans with little to no external help from a planner or an agent. Interesting results for classical planning problems have been achieved with standalone systems, such as PlanGPT [2], which uses a GPT-based model to find a plan given a domain and a problem; with general policies approaches [1, 3, 4] that uses diferent types of graph neural networks architecture to learn a generalized policy for a classical planning domain; with heuristic approaches, such as hypergraph networks [5], or relational heuristic networks [6]. The first extension that tries to tackle numeric planning problems has been numeric ASNets [ 7], it builds on the architecture described in [4], adding the possibility to reason about numeric fluents. In this work, we describe the main issues that need to be addressed, in order to handle a numeric planning problem, we propose to build upon the architecture described by t l̇berg, Bonet and Gefner [1]. Even though the scope of the architecture is to learn value functions for general policies, it can be easily used to learn heuristic values, as has already been done in [8]. The paper is organized as follows: first, we cover the background (numeric planning and GNNs). Then we describe the architecture that we want to extend. Finally, we explain the main issues that must be addressed to handle numeric planning problems. ∗Corresponding author.</p>
      </abstract>
      <kwd-group>
        <kwd>Numeric planning</kwd>
        <kwd>Graph Neural Networks</kwd>
        <kwd>Heuristic search</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Numeric Planning</title>
      <p>A numeric planning problem is a tuple Π = ⟨ 0, , ,  ⟩
where  is the set of variables, which can be
either propositional or numeric, 0 is the initial state of the problem,  is a set of actions, either numeric
or propositional, and  is the goal of the problem, which consists in a set of propositional and numeric
is a pair ⟨  (),   ()⟩
, in which   ()
represents the set of preconditions that must
goal conditions.</p>
      <p>An action  ∈</p>
      <p>CEUR</p>
      <p>ceur-ws.org
be verified in a state s before executing the action, and   ()
that will be applied to the actual state to reach the next state.
represents the set of efects of the action,
A plan  = ⟨ 1, ...,   ⟩ is a sequence of actions, and is considered a solution for the planning problem if
applying the sequence of actions, starting from the initial state of the problem, leads to a final state in
which all the goals of the problem are satisfied.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Graph Neural Networks</title>
      <p>A Graph Neural Network (GNN) is a type of neural network specifically designed to work with
graphstructured data. Traditional neural networks are designed to represent data structured in grids or
sequences, but they struggle in real-world datasets structured as graphs, like social networks or molecular
structures.</p>
      <p>A Graph Neural Network takes as input a graph  = ( , ℰ )
and a set of node features  ∈ ℝ ×| | ,
where d represents the number of features for a node, and generates node embeddings   , ∀ ∈  . These
node embeddings can then be used to predict nodes, edges or even the entire graph.
The nodes in the GNN exchange information during a message-passing iteration, with an iteration
consisting of three operations:
• Message, in which messages are sent from each node towards its neighbors,
• Aggregation, in which all messages sent to a node are aggregated together in a unique message,
• Update, in which the node representation is updated using the aggregated message.
Each node  in the network has a hidden embedding ℎ
aggregated from the neighborhood of  after each iteration.</p>
      <p>The update of the embedding of a node  can be written as:
() , that is updated according to information
 (())
= AGGREGATE() ({ℎ
() , ∀ ∈  ()}</p>
      <p>)
ℎ

(+1) = UPDATE() ((ℎ
()
,  (())
) ,
of the messages that flows from  ’s neighborhood towards  [9].
where UPDATE and AGGREGATE are arbitrary diferentiable functions, and   ()
At each iteration k of the GNN the AGGREGATE function takes as input the set of the nodes
embedding in the neighborhood for each node and computes an aggregated message  (())
is the aggregation
AGGREGATE() ({ℎ
()
, ∀ ∈  ()}, ∀ ∈</p>
      <p>. The UPDATE function combines the previous node
embedding with the message generated by the AGGREGATE function for each node, to generate a new
embedding ℎ
(+1) = UPDATE() ((ℎ
()
,  (())</p>
      <p>) , ∀ ∈  .</p>
      <p>After K iterations of the GNN, the output of the final layer defines the embeddings for each node:
  = ℎ</p>
      <p>() , ∀ ∈  .
3.1. Node and Graph classification
GNNs are used for three main tasks: node classification, graph classification, and relation (or link)
prediction.</p>
      <p>Node classification is the most common, in which the network is trained to classify new nodes between
a one-hot vector indicating the possible class of nodes.</p>
      <p>
        Similar to node classification, the main diference in graph classification is that instead of classifying
nodes, the GNN is trained to classify the whole graph or part of it. To do that the node embeddings of
the final layer (eq. 3) are used as input in a READOUT function, that summarizes the node embeddings
and classifies the graph. Lately, this has also been extended with success to graph regression, in these
instances instead of a softmax classification loss function, a squared-error loss function is usually used.
In order to learn either a value function for general policies or a heuristic for planning, the task of the
GNN is graph regression.
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
=
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        )
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Graph Neural Networks for Classical Planning</title>
      <p>The GNN architecture for learning heuristic values follows the one used by St l̇berg, Bonet and Gefner
[1, 3] for learning value functions, in which, instead of a graph, our networks represent relational
structures.</p>
      <p>This is because a state  in planning doesn’t represent a graph, but instead a relational structure that
occurs between the object of the problem. The relational structure for a state  is defined by the set of
objects, the set of domain predicates and the atoms ( 1, ...,   ) that are true in  . In particular, the set
of domain predicates is fixed for a specific domain, the set of objects may change from one instance to
another, and the value of the atoms is specific for the state  . Those considerations lead to a structure in
which the objects represent the node of our network, the predicates are possible types of arches, and
the atoms are the adjacency matrix: if ( 1,  2) is true in a state  , the relational structure for  will have
an arch of type  between  1 and  2. Technically, instead of messages flowing directly from objects to
objects, they flow to the objects towards the atoms q in which they are involved, and from q to the
objects involved in q.</p>
      <p>
        The types of arches in the relational structure for a particular planning domain are given by the lifted
predicates of the domain and the goal predicates. The GNNs map planning states s into real values V(s).
Practical example with blocksworld:
• Predicates: on(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ), handempty(0), ontable(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ),clear(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ),holding(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
• Goal predicates: on_goal(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ), handempty_goal(0), ontable_goal(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ),clear_goal(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ),holding_goal(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
• Objects: block a, b, c, d
• Actions: pickup(?b), putdown(?b), stack(?b1,?b2), unstack(?b1,?b2)
• Current state: (on-table a) (on b a) (on c b) (on d c) (clear d) (handempty)
• Goal: (on b a) (on c b) (on a d)
As we can see in the example in Fig.1, the nodes of the GNN are the object of the planning problem,
but instead of having a graph in which messages flow from an object towards its neighbors, we have
a relational structure in which messages flows from objects   to the true atoms q in s that include   ,
 = ( 1, ...,   ), 1 ≤ i ≤m, and from such atoms q to all the objects   involved in q (Fig.2).
      </p>
      <p>The aggregation functions used mimic ℎ and ℎ , and the same aggregation function is applied to
all the nodes in the GNN (i.e. to all the objects in the problem) in every layer. The same update function
is then applied for every node and in every layer, combining the previous embedding of the node with
the array given as output by the aggregation function for that node.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Extension to Numeric Planning</title>
      <p>In classical planning in order to univocally describe a state for a particular problem, knowing the true
values of the predicates in the current state and the predicates that must be true in the goal is enough.
While in classical planning action preconditions, action efects and goals can be described only with
the truth values of predicates, in numeric planning instead action preconditions and goals might be
described as numeric conditions, and action efects as numeric assignments. For example, a goal could
be (1) + 3 &lt; (2) .</p>
      <p>Therefore, to extend the previous architecture to numeric planning, diferent issues must be addressed:
• Handling numeric fluents is necessary to achieve an input state for the GNN that can represent a
state in our numeric planning problem univocally,
• Numeric goals must have a diferent representation since they can’t be directly represented by
the numeric fluents of the problem,
• Action preconditions might also be taken into account, which wasn’t useful before since they
were truth values of the boolean predicates, but now they represent diferent numeric conditions
and henceforth they might be useful information to exploit in our architecture.</p>
      <p>Our goal is to extend the existing architecture to solve these issues, and then train our models with a
dataset generated using ENHSP [10] as a teacher search, with optimal plans when possible given the
domain, and with the best combinations of search algorithm and heuristic when finding the optimal
solution is not feasible.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Preliminary results</title>
      <p>We are working on diferent approaches to handle numeric fluents and distances. While we still have to
test the new architectures that we created with a planner, we already have some interesting results
regarding the models obtained; in table.1 we show the minimum squared error obtained after training
the models, using respectively the aggregation function based on ℎ for first-order counters, and ℎ
for block grouping.</p>
      <p>If we consider that suboptimal plans can have hundreds of actions in those domains, especially in
ifrst-order counters, an MSE of 0.16 is a good result.</p>
      <p>From here, the next step will be testing the models inside a planner, to compare them to traditional
numeric heuristics.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ståhlberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bonet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Gefner</surname>
          </string-name>
          ,
          <article-title>Learning generalized policies without supervision using gnns</article-title>
          ,
          <source>arXiv preprint arXiv:2205.06002</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N.</given-names>
            <surname>Rossetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tummolo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. E.</given-names>
            <surname>Gerevini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Putelli</surname>
          </string-name>
          , I. Serina,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chiari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Olivato</surname>
          </string-name>
          ,
          <article-title>Learning general policies for planning through gpt models</article-title>
          ,
          <source>in: Proceedings of the International Conference on Automated Planning and Scheduling</source>
          , volume
          <volume>34</volume>
          ,
          <year>2024</year>
          , pp.
          <fpage>500</fpage>
          -
          <lpage>508</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ståhlberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bonet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Gefner</surname>
          </string-name>
          ,
          <article-title>Learning general optimal policies with graph neural networks: Expressive power, transparency, and limits</article-title>
          ,
          <source>in: Proceedings of the International Conference on Automated Planning and Scheduling</source>
          , volume
          <volume>32</volume>
          ,
          <year>2022</year>
          , pp.
          <fpage>629</fpage>
          -
          <lpage>637</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Toyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Trevizan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thiébaux</surname>
          </string-name>
          , L. Xie,
          <article-title>Action schema networks: Generalised policies with deep learning</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence</source>
          , volume
          <volume>32</volume>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>W.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Trevizan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thiébaux</surname>
          </string-name>
          ,
          <article-title>Learning domain-independent planning heuristics with hypergraph networks</article-title>
          ,
          <source>in: Proceedings of the International Conference on Automated Planning and Scheduling</source>
          , volume
          <volume>30</volume>
          ,
          <year>2020</year>
          , pp.
          <fpage>574</fpage>
          -
          <lpage>584</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Karia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Srivastava</surname>
          </string-name>
          ,
          <article-title>Learning generalized relational heuristic networks for model-agnostic planning</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence</source>
          , volume
          <volume>35</volume>
          ,
          <year>2021</year>
          , pp.
          <fpage>8064</fpage>
          -
          <lpage>8073</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R. X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thiébaux</surname>
          </string-name>
          ,
          <article-title>Learning generalised policies for numeric planning</article-title>
          ,
          <source>in: Proceedings of the International Conference on Automated Planning and Scheduling</source>
          , volume
          <volume>34</volume>
          ,
          <year>2024</year>
          , pp.
          <fpage>633</fpage>
          -
          <lpage>642</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ståhlberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bonet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Gefner</surname>
          </string-name>
          , Muninn,
          <source>Tenth International Planning Competition (IPC-10) Learning Track: Planner Abstracts</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>W. L.</given-names>
            <surname>Hamilton</surname>
          </string-name>
          ,
          <article-title>Graph representation learning</article-title>
          , Morgan &amp; Claypool Publishers,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>E.</given-names>
            <surname>Scala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Haslum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thiébaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ramirez</surname>
          </string-name>
          ,
          <article-title>Interval-based relaxation for general numeric planning</article-title>
          ,
          <source>in: ECAI</source>
          <year>2016</year>
          , IOS Press,
          <year>2016</year>
          , pp.
          <fpage>655</fpage>
          -
          <lpage>663</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>