1. Introduction

R-PlanGPT: Neuro-Symbolic Plan Generation via Transformer-based Language Models

Massimiliano Tummolo

Mattia Chiari

Luca Putelli

Nicholas Rossetti

Ivan Serina

Alfonso Emilio Gerevini

0 0 University of Brescia , Brescia, Italy, via Branze 38

2025

R-PlanGPT is a neuro-symbolic architecture designed to generate solution plans for classical planning problems by learning from examples. It combines a generative model (PlanGPT), a symbolic validator (VAL) and a classical planner (LPG). PlanGPT learns to solve new instances within the same domain as a general policy, treating planning as a generative task and producing action sequences given the initial state and the goal of a problem. In order to guarantee the correctness of the neural model output, VAL is called to validate every plan produced by PlanGPT. If the solution is not valid, it is repaired by LPG. We demonstrate the capabilities of R-PlanGPT on standard planning benchmarks, highlighting its ability to generate valid, high-quality plans.

1. Introduction

Recent advancements in Large Language Models (LLMs) have shown remarkable performance across various natural language processing tasks [ 1, 2 ]. However, a broader use of these technologies includes mathematical inference tasks [ 3 ], and code writing [ 4 ]. In terms of reasoning abilities, although there is a basic understanding that these model are capable or common-sense reasoning [ 5 ], an important benchmark for these abilities is automated planning [ 6 ] and, in particular, solving planning problems [ 7, 8, 9, 10 ].

In this demo, we present R-PlanGPT, a neuro-symbolic architecture that addresses plan generation as a sequence modeling task using a GPT-based architecture. As it can be seen in Figure 1, the system is composed by three main modules: PlanGPT, a GPT-model trained from scratch on classical planning problems (expressed in PDDL [ 11 ]) which learns to generate action sequences that solve planning instances in a given domain. Since there is no theoretical guarantee that the neural model provides the correct solution of the problem, R-PlanGPT includes a validator [ 12, 13 ] to ensure plan soundness. If the solution is not valid, R-PlanGPT invokes a classical planner [ 14 ] to repair the output of the neural model and to provide a valid plan [ 15 ]. We test R-PlanGPT on several benchmark domains from the International Planning Competition (IPC) [ 16 ]. Our results show that although the GPT-model reaches good results by itself, the inclusion of the symbolic components further increase the performance, making R-PlanGPT capable of generating valid high-quality plans.

2. Background

Automated Planning is a branch of Artificial Intelligence focused on generating a sequence of actions (a plan) that an agent can perform to transition from an initial state to a goal state, given a formal model of the domain [ 17 ]. While classical planners focus on solving individual problem instances, Generalized Planning (GP) instead seeks to derive general policies that solve several problems within a domain. A general policy [ 18 ] is a mapping from states (or observations) to actions, enabling an agent to solve previously unseen problems without having to compute each solution from scratch. For instance, in the well-known blocksworld domain, a general policy might instruct the agent to “clear all blocks and then stack them in goal order,” regardless of the number of blocks involved. Recent learning-based approaches have explored extracting general policies from solved examples, often using neural networks such as CNNs or GNNs [ 19, 20, 21 ]. However, these typically require domain-specific encoding and provide limited expressivity or scalability. On the other hand, Transformer-based language models have demonstrated strong capabilities in sequence modelling tasks and show potential for learning policies from data without handcrafted features [ 8, 22, 10 ].

3. Methodology

This section details the pipeline for training and using RPlanGPT for classical planning tasks. As shown in Figure 1, the system is composed of a transformer-based model (PlanGPT [ 23 ]) a symbolic validator (VAL [ 12 ]), and the LPG planner [ 14 ].

First, we create a dataset composed of solved planning instances from classical domains written in PDDL to train the neural component; these are generated using standard domain-specific generators, following the IPC conventions to ensure a range of complexity. Each problem is solved using the LPG planner, and we collect up to four plans per instance to provide diversity. To avoid overfitting to naming conventions, object names in problems and plans are randomized.

Next, we train from scratch the neural component of the system, PlanGPT, which is based on a GPT architecture and receives as input a textual prompt encoding the initial state and goal of the planning problem. The model is trained with standard cross-entropy loss to predict the next token in a plan sequence. To prevent overfitting, we use a custom early stopping criterion called Coverage Early Stopping [ 23 ], which terminates training when the percentage of valid plans generated on the validation set stabilizes.

At inference time, after the training, PlanGPT autoregressively generates grounded action sequences. However, PlanGPT can generate actions with unmet preconditions or fails to complete all the goals, thus generating an invalid plan. To check if a plan produced by PlanGPT is a valid solution, we incorporate a symbolic validator (VAL) to assess the correctness of each generated action and to verify that each goal is satisfied at the end of the generation. To prevent the model from generating non-applicable actions, we further introduce the Validated Multi-Beam Search (VAL-MB) strategy, which integrates VAL into the decoding process by validating candidate actions during generation and pruning invalid beams on the fly. Finally, if the generated output is invalid or incomplete, we apply a plan repair strategy using LPG. In this setup, a valid plan prefix is extracted and provided as a seed to LPG, which continues the search and produces a complete solution, combining the strengths of neural generation with symbolic reasoning.

4. System Demonstration

As shown in Figure 2, the demo presents an interactive system designed to showcase how to generate valid plans using R-PlanGPT. The platform allows users to interact with the whole pipeline, from problem specification to plan generation and validation.

The user begins by selecting a classical planning domain from a predefined set (e.g., blocksworld, logistics) and uploads a corresponding PDDL problem file, which includes the initial state and goal. The system automatically parses and verifies the syntactic correctness of the uploaded file against the domain definition to ensure compatibility. This procedure includes two key steps: (i) a conversion mechanism that remaps object names not present in the model’s vocabulary to randomly selected placeholder names from the vocabulary, ensuring compatibility with the model’s tokenization; and (ii) a check on the number of objects present in the problem. Since PlanGPT is trained with a fixed vocabulary size and limited object capacity, the system ensures that the number of objects in the problem does not exceed the maximum supported.If either check fails, the user receives an error message and is asked to revise the input.

Once validated, the user can trigger the plan generation step using PlanGPT. The system supports various generation strategies: greedy decoding, multi-beam search, sampling, or the Validated Multi-Beam approach. In VAL-MB, the symbolic validator is invoked at each generation step to discard invalid actions on the fly, enforcing plan soundness during decoding. For other decoding strategies, validation is performed after the complete plan has been generated. If the generated plan is deemed valid by the VAL tool, it is directly displayed to the user as a viable solution.

Otherwise, the validator identifies the first precondition violation, and the system extracts the longest valid plan prefix. At this point, the user is ofered the option to invoke LPGto repair or complete the plan. LPG uses the prefix as Model Selection  IPnrpoubtlem  PSroolvbelem 

Load 1 (define (problem BLOCKS-4-0) 2 (:domain BLOCKS) 3 (:objects D B A C - block) 4 (:INIT (CLEAR C) (CLEAR A) (CLEAR B) (CLEAR D) (ONTABLE C) (ONTABLE A) 5 (ONTABLE B) (ONTABLE D) (HANDEMPTY)) 76 )(:goal (AND (ON D C) (ON C B) (ON B A))) 8

Submit a seed to guide its search process, producing a valid plan more eficiently than starting from scratch.

Once the solution is obtained, the output is displayed, and the user is shown the generated plan, a summary of the selected domain and problem, and the validation results if requested.

5. System Evaluation

We evaluate R-PlanGPT on a suite of classical planning benchmarks, demonstrating its efectiveness. The IPCScoreQuality (IPCQ) metric evaluates the quality of solutions of a planning system by comparing each plan’s cost to the bestknown plan for the same problem. Higher scores indicate plans that are closer to optimal, while unsolved problems score zero. As shown in Table 1, R-PlanGPT outperforms all other approaches in terms of IPCQ, achieving the highest overall score of 175.00 on the IPC domains. This result demonstrates that combining PlanGPT with LPG as a post-repair step leads to plans of higher quality than either PlanGPT or symbolic planners alone. Despite the good performance of the neural model by itself, the integration of a symbolic planner improves it in domains where PlanGPT alone struggled, such as logistics, satellite, and depots. Moreover, it reaches the performance of LPG and LAMA in most domains.

6. Conclusion

In this demo, we have presented R-PlanGPT, a system that integrates language models with symbolic tools for solving classical planning problems. Future developments will focus on supporting nondeterministic planning, incorporating macro-actions and temporal logic, enhancing interpretability, or other forms of neuro-symbolic integration [ 24, 25, 26 ].

7. Acknowledgements

This work has been supported by: MUR (Italian Ministry of University and Research) PRIN-2020 project RIPER (n. 20203FFYLK); PNRR MUR project PE0000013-FAIR, cascade funding call, ResilientPlans; AI4WATER project, part of the PRIMA Programme supported by the European Union and by MUR; and by and by Regione Lombardia through the initiative "Programma degli interventi per la ripresa economica: sviluppo di nuovi accordi di collaborazione con le università per la ricerca, l’innovazione e il trasferimento tecnologico" - DGR n. XI/4445/2021. The author(s) have not employed any Generative AI tools.

[1]

Radford ,

Narasimhan , Improving language understanding by generative pre-training , in: preprint, 2018 . api.semanticscholar.org/CorpusID:49313245.

[2]

Vaswani ,

Shazeer ,

Parmar ,

Uszkoreit ,

Jones ,

A. N.

Gomez ,

Kaiser , I. Polosukhin , Attention is all you need , in: NIPS , 2017 , pp. 5998 - 6008 .

[3]

Wei ,

Wang ,

Schuurmans ,

Bosma ,

Ichter ,

Xia ,

E. H.

Chi ,

Q. V.

Le ,

Zhou , Chain-of-thought prompting elicits reasoning in large language models , in: NeurIPS , 2022 .

[4]

Wang ,

S. R.

Joty ,

S. C. H.

Hoi , Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation , in: EMNLP (1) , Association for Computational Linguistics , 2021 , pp. 8696 - 8708 .

[5]

Geva ,

Khashabi , E. Segal,

Khot ,

Roth ,

Berant , Did Aristotle use a laptop? A question answering benchmark with implicit reasoning strategies , Trans. Assoc. Comput. Linguistics 9 ( 2021 ) 346 - 361 .

[6]

Pallagani ,

B. C.

Muppasani ,

Roy ,

Fabiano ,

Loreggia ,

Murugesan ,

Srivastava ,

Rossi ,

Horesh ,

A. P.

Sheth , On the prospects of incorporating large language models (llms) in automated planning and scheduling (APS) , in: ICAPS, AAAI Press, 2024 , pp. 432 - 444 .

[7]

Chiari ,

Putelli ,

Rossetti , I. Serina ,

A. E.

Gerevini , On planning through llms , in: ICAPS, AAAI Press, 2025 .

[8]

Pallagani ,

Muppasani ,

Srivastava ,

Rossi ,

Horesh ,

Murugesan ,

Loreggia ,

Fabiano ,

Joseph ,

Kethepalli , Plansformer tool: Demonstrating generation of symbolic plans using transformers, in: IJCAI, IJCAI Org ., 2023 , pp. 7158 - 7162 .

[9]

Serina ,

Chiari ,

A. E.

Gerevini ,

Putelli , I. Serina , A preliminary study on BERT applied to automated planning , in: IPS/RiCeRcA/SPIRIT@AI*IA, volume 3345 of CEUR Workshop Proceedings, CEUR-WS.org , 2022 .

[10]

Valmeekam ,

Marquez ,

A. O.

Hernandez ,

Sreedharan ,

Kambhampati , Planbench: An extensible benchmark for evaluating large language models on planning and reasoning about change , in: NeurIPS , 2023 .

[11]

McDermott ,

Ghallab ,

A. E.

Howe ,

C. A.

Knoblock ,

Ram , M. M. Veloso , D. S.

Weld , D. E.

Wilkins , Pddlthe planning domain definition language , 1998 . URL: https://api.semanticscholar.org/CorpusID:59656859.

[12]

Howey ,

Long , M. Fox, VAL: automatic plan validation, continuous efects and mixed initiative planning using PDDL, in: ICTAI , IEEE Computer Society, 2004 , pp. 294 - 301 .

[13]

Rossetti ,

Tummolo ,

A. E.

Gerevini ,

Putelli , I. Serina,

Olivato , Enhancing gpt-based planning policies by model-based plan validation , Proceedings of the 18th International Conference on Neural-Symbolic Learning and Reasoning ( 2024 ).

[14]

Gerevini , I. Serina , LPG: A planner based on local search for planning graphs with action costs , in: AIPS, AAAI Press, 2002 , pp. 13 - 22 .

[15]

Tummolo ,

Rossetti ,

A. E.

Gerevini ,

Putelli , I. Serina,

Olivato , Integrating classical planners with gpt-based planning policies , in: AI*IA, Lecture Notes in Computer Science , Springer, 2024 .

[16]

Taitler ,

Alford ,

Espasa ,

Behnke ,

Fiser ,

Gimelfarb ,

Pommerening ,

Sanner ,

Scala ,

Schreiber ,

Segovia-Aguas , J. Seipp, The 2023 international planning competition , AI Mag . 45 ( 2024 ) 280 - 296 .

[17]

Ghallab ,

D. S.

Nau , P. Traverso, Automated planning - theory and practice , Elsevier, 2004 .

[18]

Hu , G. De Giacomo, Generalized planning: Synthesizing plans that work for multiple environments , in: IJCAI,

IJCAI

Org ., 2011 , pp. 918 - 923 .

[19]

Groshev ,

Goldstein ,

Tamar ,

Srivastava ,

Abbeel , Learning generalized reactive policies using deep neural networks , in: ICAPS , AAAI Press, 2018 , pp. 408 - 416 .

[20]

Ståhlberg ,

Bonet ,

Gefner , Learning general optimal policies with graph neural networks: Expressive power, transparency, and limits , in: ICAPS, AAAI Press, 2022 , pp. 629 - 637 .

[21]

Toyer ,

Thiébaux ,

F. W.

Trevizan , L. Xie, Asnets: Deep learning for generalised planning , J. Artif. Intell. Res . 68 ( 2020 ) 1 - 68 .

[22]

Silver ,

Dan ,

Srinivas ,

J. B.

Tenenbaum ,

L. P.

Kaelbling ,

Katz , Generalized planning in PDDL domains with pretrained large language models , in: AAAI, AAAI Press, 2024 , pp. 20256 - 20264 .

[23]

Rossetti ,

Tummolo ,

A. E.

Gerevini ,

Putelli , I. Serina,

Chiari ,

Olivato , Learning general policies for planning through gpt models , Proceedings of the International Conference on Automated Planning and Scheduling 34 ( 2024 ) 500 - 508 .

[24]

Chiari ,

A. E.

Gerevini ,

Percassi ,

Putelli , I. Serina,

Olivato , Goal recognition as a deep learning task: The grnet approach , in: ICAPS, AAAI Press, 2023 , pp. 560 - 568 .

[25]

Chiari ,

A. E.

Gerevini ,

Loreggia ,

Putelli , I. Serina , Fast and slow goal recognition , in: M. Dastani , J. S.

Sichman , N.

Alechina , V. Dignum (Eds.), Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2024 , Auckland, New Zealand, May 6- 10 , 2024 , International Foundation for Autonomous Agents and Multiagent Systems / ACM , 2024 , pp. 354 - 362 .

[26]

Serina ,

Chiari ,

A. E.

Gerevini ,

Putelli , I. Serina , Towards eficient online goal recognition through deep learning , in: AAMAS, International Foundation for Autonomous Agents and Multiagent Systems / ACM , 2025 , pp. 1895 - 1903 .