Addressing the Symbol Grounding Problem with
Constraints in Neuro-Symbolic Planning
Aymeric Barbin1,2,* , Federico Cerutti2,3 and Alfonso Emilio Gerevini2
1
  Sapienza Università di Roma, Italy
2
  Università degli Studi di Brescia, Italy
3
  Cardiff University, UK


                                         Abstract
                                         In this paper, we address the Symbol Grounding Problem (SGP) in the context of neuro-symbolic planning,
                                         where the categorical vectors learned to represent high dimensional inputs suffer from instability, which
                                         poses a problem of efficiency during the planning phase. One way to alleviate the SGP is to enforce
                                         constraints— among the latent variables — by expressing them in the loss function during the learning
                                         process. Combining an existing tool for invariant search and ideas from Logic Tensor Networks (fuzzy
                                         logic), we propose to automatize the process of finding and enforcing relevant constraints. We apply our
                                         idea to LatPlan, a domain independent, image-based classical planner.

                                         Keywords
                                         Neuro-Symbolic Planning, Symbol Grounding Problem, Action Model Learning


1. Introduction
The core interest of Domain-Independent Planning [1] is developing general-purpose algorithms
and systems that can solve planning problems independently of any specific knowledge of the
latter. A planning problem can be specified using the Planning Domain Description Language
(PDDL) [2] in terms of a symbolic description of the states and actions composing the domain,
and initial state, and a goal.
   When the states of the problem are only available as sub-symbolic data (e.g., images), generat-
ing a PDDL description necessarily needs to address the Symbol Grounding Problem (SGP) [3]
which refers to the case where two different inputs (e.g., images) grounding the same symbol
(e.g., digit "1") have different vector representations, revealing a lack of generalisation power.
   LatPlan [4] is a neuro-symbolic architecture for planning proposed to address the bottleneck
of PDDL construction from raw input data. It leverages Deep Learning to learn the PDDL
description from a set of unlabeled pairs of transition images. More specifically, it uses variational
autoencoders (VAE) [5] to learn and generate categorical vector representations of the images
(states) and of the transitions between them (actions), which are then used to generate the


Italian Workshop on Planning and Scheduling (IPS-2022, 10th edition)
*
 Corresponding author.
$ aymeric.barbin@uniroma1.it (A. Barbin); federico.cerutti@unibs.it (F. Cerutti); alfonso.gerevini@unibs.it
(A. E. Gerevini)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
PDDL. LatPlan is also endowed with tools that alleviate the SGP of these representations, which
can be assessed by measuring their variance on perturbed input.
   Another way to address the SGP is to enforce constraints on the categorical vectors. If we
consider a vector 𝑣 as an ideally grounded symbol and its unstable version 𝑣 ′ , then 𝑣 ′ contains
some noise, i.e. variables with undesired values. If we know appropriate constraints to apply
during training, the noisy representation corresponding to 𝑣 ′ should converge to 𝑣.
   In this paper, we propose to use an automatic tool to look for invariants in the PDDL generated
by LatPlan and to test their incidence on the learning by expressing them as an auxiliary loss
term. To do so, we borrow ideas from Logic Tensor Networks (LTN) [6], where this additional
loss is created using the t-norm fuzzy logic [7]. In Section 2, we provide background on LatPlan,
invariants, and fuzzy logic, then, in Section 3, we discuss our proposed solution to integrate the
search for constraints into the training loop.


2. Background
2.1. LatPlan
LatPlan is a neuro-symbolic architecture that receives pairs of images representing transitions
and learns categorical vector representations from them. Once trained, it can generate a PDDL
representation of the states and actions of the problem, which can be input to a planner.
   For our work, we are interested in improving the learning of two components of LatPlan, the
State AutoEncoder network (SAE), which learns to represent images as categorical vectors, and
the Action Model Acquisition network (AMA), which learns to represent actions as categorical
vectors representing preconditions and effects. These two models are learned end-to-end from a
dataset of pairs of images, each pair representing a valid transition occurring in the sub-symbolic
world (e.g. the switching of a tile in the case of 8-puzzle).
   The SAE learns a bi-directionnal mapping between sub-symbolic raw data 𝑥 and propositional
states 𝑧 ∈ {0, 1}𝐹 (with 𝐹 being the number of variables in the categorical vector). Concretely,
it consists of the encoder and the decoder of a VAE that learns DECODE(ENCODE(𝑥)) = 𝑥.
   The AMA model consists of three networks: ACTION, APPLY and REGRESS. ACTION
learns to associate two consecutive states (returned by the SAE) to an action (expressed as a
one-hot vector). APPLY learns to predict the next categorical state from the previous one and
an action. REGRESS is symmetric of APPLY and learns to predict backwards the previous
state from the next one and from the action.
   The SAE, APPLY and REGRESS networks all output binary categorical vectors of the same
size as the SAE’s output. Each element of these vectors is interpreted as a binary variable and is
represented by a unary predicate in the PDDL files generated by LatPlan. This representation is
negatively affected by the SGP, as it can break the identity assumption inherent to symbolic
reasoning algorithms— in which a state must not change— and can cause disconnections during
the search process, i.e. if two states exist that should represent the same state, an action might
lead to one of them but not to the other, the latter would then be a dead end.
   To address this issue, the authors of LatPlan propose two solutions. First, at test time,
they replace the sampling of the categorical vector with an argmax layer, therefore removing
stochasticity. Second, during training, they select a version of the prior distribution for the
sampling (of the categorical vector) that favours sparsity of truthiness among the binary variables.
This leads to more stable latent state vectors, and showed improved performances of the model
in terms of next-state prediction accuracy (APPLY network) and planning performances. In
our work, we are mainly interested in stabilizing the latent states vectors, generated by the SAE
network; to assess it, we use the same metric as in LatPlan paper, i.e. the State Variance. To
compute it, we use the state variance for noisy input, i.e., the variance of the latent vectors :

                                  𝑧 𝑖,0 = 𝐸𝑁 𝐶𝑂𝐷𝐸(𝑥𝑖,0 + 𝑛)

where:

    • 𝑛 ∼ 𝑁 (𝜇 = 0, 𝜎 = 0.3)
    • 𝑥𝑖,0 : 1𝑠𝑡 image of the 𝑖𝑡ℎ transition pair of the dataset

   The State Variance is computed by iterating over 10 random vectors, then averaged over F
bits in the latent space and over the dataset indexed by i. Formally:

                           E𝑓 ∈0...𝐹 E𝑖 Var 𝑗∈0..10 [ENCODE(𝑥𝑖,0 , 𝑛𝑗 )𝑓 ]                        (1)

2.2. Invariants and constraints in planning
In classical planning, an invariant is defined as a logical formula over variables of the domain
which is true in any reachable state [8]. Invariants can be seen as hidden but logical properties
of the domain, and can also be termed as state constraints. In LatPlan, we can enforce them in
learning the SAE and AMA networks.
   An important part of the research in classical planning focuses on discovering invariants
in planning domains. Today, automatic tools [9] [10] already exist that can find a variety of
invariants, such as predicate domain invariants (i.e., in the effect of an action), static invariants
on predicates (ones that are unaffected by any operator), simple implicative invariants (e.g., if
𝑧1 → 𝑧2 ), mutually exclusive invariant (e.g., ¬𝑧1 ∨ ¬𝑧2 ), etc.

2.3. Logic Tensor Networks and Fuzzy logic
Extensive work on integrating logical constraints during the training of neural networks have
been conducted in the last few years. Notably, one can express constraints among neural
networks outputs taken as predicates thanks to t-norm operations and the fuzzy generalisation
of First Order Logic (FOL) [11] [7].
   For example, if we want to express the truth value of ¬𝑧1 ∨ ¬𝑧2 , we can compute it by its
average over the dataset, i.e.,
                               1 ∑︁
                                    (1 − 𝑃 (𝑧1 )) + (1 − 𝑃 (𝑧2 ))                                 (2)
                              |𝑍|
                                  𝑧1 ,𝑧2 ∈𝑍

where 𝑃 (𝑧1 ) is the (continuous) value of truth of one grounding of “𝑧1 is true”, and 𝑍 is the set
of all the groundings. The inverse of this value can be directly appended as a penalty to the loss
function, which eventually forces the network to re-adapt its weights to this new constraint.
3. Automatically finding and enforcing relevant invariant
   candidates
We first discuss how we intend to search for invariants of interest among LatPlan binary
variables with the help of an automatic tool; then, we discuss how to enforce these invariants
during training and we give details about integrating the search in the training loop.

3.1. Searching for invariants of interest
Automatic tools, like Fast Downward (FD) [10] and DISCOPLAN [9], can find invariants in the
PDDL representation outputted by LatPlan using invariant synthesis . But, since LatPlan learns
the PDDL by statistical inference, there is no guarantee that any of the invariants found in the
PDDL maps to the ground truth invariants, i.e., to invariants of the (unknown) ground truth
PDDL domain. For example if 𝑧1 → 𝑧2 is returned by FD as an invariant, but 𝑧1 is true only
once in the whole dataset, it’s possible that 𝑧1 → 𝑧2 would be false if we had more data with 𝑧1
being true. Thus, in our work we consider the invariants computed by an invariants generation
tool (on LatPlan PDDL) as a set of probable invariants that we intend to test.

3.2. Applying the constraints
In Latplan, internal categorical representations are continuous vectors from which elements
converge to binary values during training. At each batch, we can compute the truth value —
thanks to fuzzy logic — of any constraint, expressed as a logical forumula, over these binary
variables, for instance “𝑧1 is true.” Then, by taking the inverse of this value and multiplying it by
a normalizing factor, we obtain an additional loss that we can append to the total loss. This way,
the network will adapt its weights through backpropagation to comply with the constraint.

3.3. Augmenting the training loop with a search over invariants
Our idea is to integrate an automatic invariant finder in the learning process of LatPlan. More
precisely, we want to perform a search— similar to a hyperparameter search— on the invariants.
Further details can be find in Appendix A. The implementation of this loop builds upon the
t-norm like functions that already exist in Tensorflow [12] and the possibility to customize the
training loop in Keras [13], LatPlan being coded with both of them.


4. Conclusions and Future Work
We discussed the idea of combining an automatic invariants finder with fuzzy logic to progress
in solving the SGP that affects the latent representations of a neuro-symbolic architecture
(LatPlan). More precisely, the invariants found by the automatic tool are probable invariants -
since the PDDL they are issued from is learned by statistical inference - and we want to enforce
them during training. If they bring a lower state variance, it is probable that they correspond
to ground true invariants. The goal is to automatically find invariants that are responsible for
more state stability. We are now implementing the search loop introduced in Section 3.
References
 [1] D. E. Wilkins, Domain-independent planning representation and plan generation, Artificial
     Intelligence 22 (1984) 269–301.
 [2] P. Haslum, N. Lipovetzky, D. Magazzeni, C. Muise, An introduction to the planning domain
     definition language, Synthesis Lectures on Artificial Intelligence and Machine Learning 13
     (2019) 1–187.
 [3] M. Taddeo, L. Floridi, Solving the symbol grounding problem: a critical review of fifteen
     years of research, Journal of Experimental & Theoretical Artificial Intelligence 17 (2005)
     419–445.
 [4] M. Asai, A. Fukunaga, Classical planning in deep latent space: Bridging the subsymbolic-
     symbolic boundary, in: Proceedings of the aaai conference on artificial intelligence,
     volume 32, 2018.
 [5] D. P. Kingma, M. Welling, et al., An introduction to variational autoencoders, Foundations
     and Trends® in Machine Learning 12 (2019) 307–392.
 [6] S. Badreddine, A. d. Garcez, L. Serafini, M. Spranger, Logic tensor networks, Artificial
     Intelligence 303 (2022) 103649.
 [7] L. A. Zadeh, Fuzzy logic, Computer 21 (1988) 83–93.
 [8] V. Alcázar, A. Torralba, A reminder about the importance of computing and exploiting
     invariants in planning, in: Proceedings of the International Conference on Automated
     Planning and Scheduling, volume 25, 2015, pp. 2–6.
 [9] A. Gerevini, L. K. Schubert, Discovering state constraints in discoplan: Some new results,
     in: AAAI/IAAI, 2000, pp. 761–767.
[10] M. Helmert, Concise finite-domain representations for pddl planning tasks, Artificial
     Intelligence 173 (2009) 503–535.
[11] P. Hájek, Metamathematics of fuzzy logic, volume 4, Springer Science & Business Media,
     2013.
[12] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving,
     M. Isard, et al., {TensorFlow}: a system for {Large-Scale} machine learning, in: 12th
     USENIX symposium on operating systems design and implementation (OSDI 16), 2016, pp.
     265–283.
[13] F. Chollet, et al., Keras, https://keras.io, 2015.


A. The proposed augmented training loop
This augmented training loop, GreedyConstraintTraining, is illustrated in Algorithm 1, which
uses the following auxiliary functions:
    • LatPlan, which is a function that embeds the training process of LatPlan: it receives as
      input a training set, a set of propositional formulae to embed in the loss function, and
      outputs the learned functions and action descriptions as binary vectors;
    • Vecs2Pddl transforms the action descriptions learned by LatPlan into PDDL format;
    • Pddl2Inv is a function that returns a set of invariants of a planning problem in the PDDL
      format provided in the input. An example is the invariant finder of Fast Downward;
      • MetricEval receives as input the learned functions and action descriptions of LatPlan and
        outputs a score value: the higher the score value, the better. In our case, it is the inverse
        of the State Variance given in Formula 1 (in this case only the SAE function of LatPlan is
        needed).
      • PickAnInvariant is a heuristic that identifies the most promising invariant in a set; for
        instance an invariant with the less number of variables (because it has a higher probability
        to be a ground true invariant compared to one with more variables).


Algorithm 1 Our proposed GreedyConstraintTraining approach, which receives as input a
training set tSet, and returns Ω, the invariants which improve on a chosen metric
 1: Ω ← ∅
 2: LPModel ← LatPlan(tSet, ∅)
 3: Φ ← Pddl2Inv(Vecs2Pddl(LPModel ))
 4: score ← MetricEval(LPModel )
 5: while Φ ̸= ∅ do
 6:    𝜑 ← PickAnInvariant(Φ)
 7:    Φ ← Φ ∖ {𝜑}
 8:    LPModel ′ ← LatPlan(tSet, Ω ∪ {𝜑})
 9:    if MetricEval(LPModel ′ ) > score then
10:        Ω ← Ω ∪ {𝜑}
11:        Φ ← Φ ∪ Pddl2Inv(Vecs2Pddl(LPModel ′ ))
12:        score ← MetricEval(LPModel ′ )
13:    end if
14: end while
15: return Ω