1. Introduction

1613-0073

Neural Networks for Numeric Planning: A Preliminary Study

Valerio Borelli

valerio.borelli@unibs.it 0 1 2

Alfonso Emilio Gerevini

alfonso.gerevini@unibs.it 0 2

Enrico Scala

enrico.scala@unibs.it 0 2

Ivan Serina

ivan.serina@unibs.it 0 2

Workshop

0 0 Doctoral Consortium at the 23rd International Conference of the Italian Association for Artificial Intelligence Bolzano , Italy 1 Sapienza University of Rome , Italy 2 University of Brescia , Italy

2024

25 28

We consider the problem of learning heuristics for numeric planning domains, using Graph Neural Networks. The problem has been approached multiple times, from diferent perspectives and with varying results for classical planning, but is relatively new for numeric planning. The goal is to extend the work proposed by St l̇berg, Bonet, and Gefner [ 1] to handle numeric planning problems. In recent years diferent works have shown how deep learning approaches can be used to solve planning problems, either as heuristic functions for classical planning, as value functions for general policies, or even as a separate system that can find plans with little to no external help from a planner or an agent. Interesting results for classical planning problems have been achieved with standalone systems, such as PlanGPT [2], which uses a GPT-based model to find a plan given a domain and a problem; with general policies approaches [1, 3, 4] that uses diferent types of graph neural networks architecture to learn a generalized policy for a classical planning domain; with heuristic approaches, such as hypergraph networks [5], or relational heuristic networks [6]. The first extension that tries to tackle numeric planning problems has been numeric ASNets [ 7], it builds on the architecture described in [4], adding the possibility to reason about numeric fluents. In this work, we describe the main issues that need to be addressed, in order to handle a numeric planning problem, we propose to build upon the architecture described by t l̇berg, Bonet and Gefner [1]. Even though the scope of the architecture is to learn value functions for general policies, it can be easily used to learn heuristic values, as has already been done in [8]. The paper is organized as follows: first, we cover the background (numeric planning and GNNs). Then we describe the architecture that we want to extend. Finally, we explain the main issues that must be addressed to handle numeric planning problems. ∗Corresponding author.

Numeric planning Graph Neural Networks Heuristic search

1. Introduction 2. Numeric Planning

A numeric planning problem is a tuple Π = ⟨ 0, , , ⟩ where is the set of variables, which can be either propositional or numeric, 0 is the initial state of the problem, is a set of actions, either numeric or propositional, and is the goal of the problem, which consists in a set of propositional and numeric is a pair ⟨ (), ()⟩ , in which () represents the set of preconditions that must goal conditions.

An action ∈

CEUR

ceur-ws.org be verified in a state s before executing the action, and () that will be applied to the actual state to reach the next state. represents the set of efects of the action, A plan = ⟨ 1, ..., ⟩ is a sequence of actions, and is considered a solution for the planning problem if applying the sequence of actions, starting from the initial state of the problem, leads to a final state in which all the goals of the problem are satisfied.

3. Graph Neural Networks

A Graph Neural Network (GNN) is a type of neural network specifically designed to work with graphstructured data. Traditional neural networks are designed to represent data structured in grids or sequences, but they struggle in real-world datasets structured as graphs, like social networks or molecular structures.

A Graph Neural Network takes as input a graph = ( , ℰ ) and a set of node features ∈ ℝ ×| | , where d represents the number of features for a node, and generates node embeddings , ∀ ∈ . These node embeddings can then be used to predict nodes, edges or even the entire graph. The nodes in the GNN exchange information during a message-passing iteration, with an iteration consisting of three operations: • Message, in which messages are sent from each node towards its neighbors, • Aggregation, in which all messages sent to a node are aggregated together in a unique message, • Update, in which the node representation is updated using the aggregated message. Each node in the network has a hidden embedding ℎ aggregated from the neighborhood of after each iteration.

The update of the embedding of a node can be written as: () , that is updated according to information (()) = AGGREGATE() ({ℎ () , ∀ ∈ ()}

) ℎ (+1) = UPDATE() ((ℎ () , (()) ) , of the messages that flows from ’s neighborhood towards [9]. where UPDATE and AGGREGATE are arbitrary diferentiable functions, and () At each iteration k of the GNN the AGGREGATE function takes as input the set of the nodes embedding in the neighborhood for each node and computes an aggregated message (()) is the aggregation AGGREGATE() ({ℎ () , ∀ ∈ ()}, ∀ ∈

. The UPDATE function combines the previous node embedding with the message generated by the AGGREGATE function for each node, to generate a new embedding ℎ (+1) = UPDATE() ((ℎ () , (())

) , ∀ ∈ .

After K iterations of the GNN, the output of the final layer defines the embeddings for each node: = ℎ

() , ∀ ∈ . 3.1. Node and Graph classification GNNs are used for three main tasks: node classification, graph classification, and relation (or link) prediction.

Node classification is the most common, in which the network is trained to classify new nodes between a one-hot vector indicating the possible class of nodes.

Similar to node classification, the main diference in graph classification is that instead of classifying nodes, the GNN is trained to classify the whole graph or part of it. To do that the node embeddings of the final layer (eq. 3) are used as input in a READOUT function, that summarizes the node embeddings and classifies the graph. Lately, this has also been extended with success to graph regression, in these instances instead of a softmax classification loss function, a squared-error loss function is usually used. In order to learn either a value function for general policies or a heuristic for planning, the task of the GNN is graph regression. ( 1 ) ( 2 ) = ( 3 )

4. Graph Neural Networks for Classical Planning

The GNN architecture for learning heuristic values follows the one used by St l̇berg, Bonet and Gefner [1, 3] for learning value functions, in which, instead of a graph, our networks represent relational structures.

This is because a state in planning doesn’t represent a graph, but instead a relational structure that occurs between the object of the problem. The relational structure for a state is defined by the set of objects, the set of domain predicates and the atoms ( 1, ..., ) that are true in . In particular, the set of domain predicates is fixed for a specific domain, the set of objects may change from one instance to another, and the value of the atoms is specific for the state . Those considerations lead to a structure in which the objects represent the node of our network, the predicates are possible types of arches, and the atoms are the adjacency matrix: if ( 1, 2) is true in a state , the relational structure for will have an arch of type between 1 and 2. Technically, instead of messages flowing directly from objects to objects, they flow to the objects towards the atoms q in which they are involved, and from q to the objects involved in q.

The types of arches in the relational structure for a particular planning domain are given by the lifted predicates of the domain and the goal predicates. The GNNs map planning states s into real values V(s). Practical example with blocksworld: • Predicates: on( 2 ), handempty(0), ontable( 1 ),clear( 1 ),holding( 1 ) • Goal predicates: on_goal( 2 ), handempty_goal(0), ontable_goal( 1 ),clear_goal( 1 ),holding_goal( 1 ) • Objects: block a, b, c, d • Actions: pickup(?b), putdown(?b), stack(?b1,?b2), unstack(?b1,?b2) • Current state: (on-table a) (on b a) (on c b) (on d c) (clear d) (handempty) • Goal: (on b a) (on c b) (on a d) As we can see in the example in Fig.1, the nodes of the GNN are the object of the planning problem, but instead of having a graph in which messages flow from an object towards its neighbors, we have a relational structure in which messages flows from objects to the true atoms q in s that include , = ( 1, ..., ), 1 ≤ i ≤m, and from such atoms q to all the objects involved in q (Fig.2).

The aggregation functions used mimic ℎ and ℎ , and the same aggregation function is applied to all the nodes in the GNN (i.e. to all the objects in the problem) in every layer. The same update function is then applied for every node and in every layer, combining the previous embedding of the node with the array given as output by the aggregation function for that node.

5. Extension to Numeric Planning

In classical planning in order to univocally describe a state for a particular problem, knowing the true values of the predicates in the current state and the predicates that must be true in the goal is enough. While in classical planning action preconditions, action efects and goals can be described only with the truth values of predicates, in numeric planning instead action preconditions and goals might be described as numeric conditions, and action efects as numeric assignments. For example, a goal could be (1) + 3 < (2) .

Therefore, to extend the previous architecture to numeric planning, diferent issues must be addressed: • Handling numeric fluents is necessary to achieve an input state for the GNN that can represent a state in our numeric planning problem univocally, • Numeric goals must have a diferent representation since they can’t be directly represented by the numeric fluents of the problem, • Action preconditions might also be taken into account, which wasn’t useful before since they were truth values of the boolean predicates, but now they represent diferent numeric conditions and henceforth they might be useful information to exploit in our architecture.

Our goal is to extend the existing architecture to solve these issues, and then train our models with a dataset generated using ENHSP [10] as a teacher search, with optimal plans when possible given the domain, and with the best combinations of search algorithm and heuristic when finding the optimal solution is not feasible.

6. Preliminary results

We are working on diferent approaches to handle numeric fluents and distances. While we still have to test the new architectures that we created with a planner, we already have some interesting results regarding the models obtained; in table.1 we show the minimum squared error obtained after training the models, using respectively the aggregation function based on ℎ for first-order counters, and ℎ for block grouping.

If we consider that suboptimal plans can have hundreds of actions in those domains, especially in ifrst-order counters, an MSE of 0.16 is a good result.

From here, the next step will be testing the models inside a planner, to compare them to traditional numeric heuristics.

[1]

Ståhlberg ,

Bonet ,

Gefner , Learning generalized policies without supervision using gnns , arXiv preprint arXiv:2205.06002 ( 2022 ).

[2]

Rossetti ,

Tummolo ,

A. E.

Gerevini ,

Putelli , I. Serina,

Chiari ,

Olivato , Learning general policies for planning through gpt models , in: Proceedings of the International Conference on Automated Planning and Scheduling , volume 34 , 2024 , pp. 500 - 508 .

[3]

Ståhlberg ,

Bonet ,

Gefner , Learning general optimal policies with graph neural networks: Expressive power, transparency, and limits , in: Proceedings of the International Conference on Automated Planning and Scheduling , volume 32 , 2022 , pp. 629 - 637 .

[4]

Toyer ,

Trevizan ,

Thiébaux , L. Xie, Action schema networks: Generalised policies with deep learning , in: Proceedings of the AAAI Conference on Artificial Intelligence , volume 32 , 2018 .

[5]

Shen ,

Trevizan ,

Thiébaux , Learning domain-independent planning heuristics with hypergraph networks , in: Proceedings of the International Conference on Automated Planning and Scheduling , volume 30 , 2020 , pp. 574 - 584 .

[6]

Karia ,

Srivastava , Learning generalized relational heuristic networks for model-agnostic planning , in: Proceedings of the AAAI Conference on Artificial Intelligence , volume 35 , 2021 , pp. 8064 - 8073 .

[7]

R. X.

Wang ,

Thiébaux , Learning generalised policies for numeric planning , in: Proceedings of the International Conference on Automated Planning and Scheduling , volume 34 , 2024 , pp. 633 - 642 .

[8]

Ståhlberg ,

Bonet ,

Gefner , Muninn, Tenth International Planning Competition (IPC-10) Learning Track: Planner Abstracts ( 2023 ).

[9]

W. L.

Hamilton , Graph representation learning , Morgan & Claypool Publishers, 2020 .

[10]

Scala ,

Haslum ,

Thiébaux ,

Ramirez , Interval-based relaxation for general numeric planning , in: ECAI 2016 , IOS Press, 2016 , pp. 655 - 663 .