From Subsymbolic to Symbolic: A Blueprint for Investigation Joseph Pober1,∗,† , Michael Luck1,† and Odinaldo Rodrigues1,† 1 Department of Informatics, Bush House, King’s College London, London WC2B 4BG, UK Abstract In this paper, we sketch a framework for integration between subsymbolic and symbolic representations, consisting of a series of layers and mappings between elements across the layers. Each layer corresponds to a particular level of abstraction about phenomena in the environment being observed in the layers below. Through an iterative process, the differences between the elements in successive iterations within a given layer are captured as transformations between the elements and used for identification and recognition of objects as well as prediction and verification of the environment in future iterations. A bridge between the subsymbolic and symbolic levels can be built by successively adding layers at ever more sophisticated levels of abstraction. This approach aims to benefit from subsymbolic learning, while harnessing the abstraction and reasoning powers of classical symbolic AI techniques. Keywords neuro-symbolic integration, predicate learning, learning structured representations 1. Introduction While extremely valuable, especially in domains with an abundance of existing data, such as in game playing and natural language processing, subsymbolic techniques often have serious shortcomings in aspects that can be critical for the development of Artificial General Intelligence, e.g., abstraction, transfer learning, interpretability [1]. In humans, one can argue that these cognitive functions have evolved in tandem with the development of natural language, allowing a deeper understanding of the world around us through the construction of ever more sophisticated abstract symbolic models. There are obvious advantages of these symbolic models: they can be communicated between individuals, translated between (formal) languages, be refined and revised, used in different domains, composed, etc. Classical AI often uses formal languages such as logic to represent the world and to model concepts such as actions and change. While such reasoning models overcome the shortcomings of purely subsymbolic approaches, they cannot easily learn from new data, or construct or revise themselves. Combining the power of subsymbolic approaches with the elegance and flexibility of the symbolic ones has huge potential and yet it has proven elusive [2]. The central message of this paper is that the bridge between subsymbolic and symbolic representations needs to be built as a series of layers of abstractions mapping artefacts and concepts between the layers, using basic building blocks through which symbols can be constructed from entirely subsymbolic data. The ultimate aim is to build a top-level symbolic layer which can be used to describe phenomena of interest perceived by the lower subsymbolic ones. Our focus is on how to create these building blocks in an unsupervised fashion and to devise mechanisms by which they can be used to generate symbols representing a variety of concepts, e.g., objects, transformations, relations. We illustrate our ideas by considering the well-known game of Pong, which consists of a ball and two paddles. In our setting, this game is perceived through a sequence of still images associated with consecutive snapshots of the game. NESY 2022: 16th International Workshop on Neural-Symbolic Learning and Reasoning, Cumberland Lodge, Windsor, UK ∗ Corresponding author. † These authors contributed equally. Envelope-Open joseph.pober@kcl.ac.uk (J. Pober); michael.luck@kcl.ac.uk (M. Luck); odinaldo.rodrigues@kcl.ac.uk (O. Rodrigues) Orcid 00000-0002-0926-2061 (M. Luck); 0000-0001-7823-1034 (O. Rodrigues) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) 𝑚0 𝑚0 𝑎2 𝑚0 𝑜2 stage 𝑖: attention identification −−−−−−−−−−−→ −−−−−−−−−−−→ 𝑎3 𝑜3 𝑎1 𝑜1 𝑚1 𝑚1 𝑎2 𝑚1 𝑜2 stage 𝑖 + 𝑘: attention recognition −−−−−−−−−−−→ −−−−−−−−−−−→ 𝑎3 𝑜3 𝑎1 𝑜1 Figure 1: Initial image at subsymbolic level showing the identification of areas of interest in one stage, and the subsequent recognition of objects at a later stage. 2. Learning about Objects Suppose we want to describe in a symbolic way a scene perceived as a sequence 𝑀 of still images, each represented as a binary matrix 𝑚𝑖 of 𝑟 × 𝑐 picture elements (pixels) 𝑀 = 𝑚0 , 𝑚1 , 𝑚2 , …. For simplicity, let us assume that these elements are binary, i.e., 0 or 1 (resp., ‘white’ – background, and ‘black’ – foreground). We can interpret each matrix as a snapshot of the world at a particular point in time and the sequence of matrices as the world’s evolution over time. This paper describes what is involved, aiming to provide a blueprint for investigation. In the first stage, we distinguish objects in the scene and associate symbols to them (an ongoing problem both in philosophy [3, 4, 5, 6] and computer science [7, 8]). While there have been prior formalisations of this problem [9, 10, 11], the notion of ‘object’ in our approach is central. Here we need to make suitable assumptions about what is of interest (to an agent) in the image and, for simplicity, assume that the black pixels are associated with objects of interest in the real world. Consider the matrices in Figure 1. It is not difficult to identify at the subsymbolic (pixel) level the areas in red representing structured elements (or patterns), which we refer to as objects, within our unstructured input space. Let us assume that this can be done in an unsupervised manner using, e.g., an attention function [12, 13], defining areas of interest (AoIs) 𝑎1 , 𝑎2 , 𝑎3 , … . This process should allow not only the creation of tracking mechanisms for the areas of interest in the images, but also the creation of new symbols for real-world objects and hence the establishment of an association between the real-world object, its perception and identification/recognition by the system (through some appropriate mechanism), and a symbolic representation. We aim to develop a framework in which objects, their properties, and the way in which they change can all be expressed and linked to these symbolic representations. Our first objective is therefore to create a set of symbols and link these with a subsymbolic representation. To this end, the representation of an object, which we call a signature, exists in a more abstract space [14] than the direct visual representation of an AoI. The process of creating new symbols, e.g. 𝑜1 , 𝑜2 , 𝑜3 , and linking them to an abstract representation (see Figure 1, top), which we call identification, aims in part to allow these objects to be recognised in subsequent images. This is not trivial as differences in AoIs arise due to movement, inaccurate sensor information, etc., so there must be sufficient information to allow the association of the signature of an object in a new image with those from old images and the consequent retrieval of the previously associated symbols — we call this process recognition (see Figure 1, bottom). By comparing distinct objects (in the same image) we can learn about similarities and differences [15, 16], generating a set of subsymbolic abstract properties 𝜌1 , 𝜌2 , …, each assigned its own symbol, 𝑝1 , 𝑝2 , …, thus linking the property’s symbol to its subsymbolic representation in a process analogous to that of object-symbol creation. Initially, these properties are broad, yet object-specific — they are learnt from only one object. Once multiple objects have been identified with primitive properties created, these can be compared, finding similarities and distinctions that can be expressed as new properties themselves. For example, comparing multiple objects that share a common feature, such as having the colour blue, etc, produces a set of similar ‘shared properties’. If we apply the comparison again to the subsymbolic signatures of these shared properties, we can in turn figure out what they have in common, and what makes them distinct, eventually isolating a specific property of interest, such as ‘blueness’. Once we have these primitives, we can re-define objects in terms of the properties they have. This has two important effects: it not only increases the vocabulary of our symbolic language, but it also allows objects to be symbolically represented in a non-atomic way. Non-atomicity is essential for the symbolic description of how an object changes (see Section 3). Of course, this suggests that the transition from subsymbolic to symbolic should be made through (several) layers of abstraction, until the desired level of granularity at the symbolic level can be achieved. 3. Reasoning About Transformations Assuming we have mechanisms for uniquely identifying and recognising objects, we want to represent how objects change over time, aiming to express temporal transformations symbolically, retaining the association between the transformation and the representation. To be clear, if we perceive a change in the AoIs associated with a recognised object in successive images, we also want to associate a symbol with this change, allowing future reasoning about the transformation’s ‘meaning’. 𝑚0 𝑚1 𝑚0 𝑎02 𝑚1 𝑎12 𝜏0 𝜏01 , 𝜏02 , 𝜏03 −−−−−−−−→ −−−−−−−−→ 𝑎13 𝑎03 𝑎11 𝑎01 Figure 2: Variations between subsequent images, due to the leftmost paddle and ball moving up. Consider the transformation 𝜏0 of matrix 𝑚0 into 𝑚1 on the left of Figure 2. At the subsymbolic level, we can consider the transformation applied to 𝑚0 as a whole, but to reason about the objects in the image, we are more interested in the transformations applied to the AoIs 𝑎1 , 𝑎2 and 𝑎3 , denoted by 𝜏01 , 𝜏02 , and 𝜏03 , respectively (see right hand side of Figure 2). Assume that the area 𝑎𝑡𝑖 is associated with object 𝑜𝑖 in matrix 𝑚𝑡 . For example, 𝑎01 is associated with object 𝑜1 in 𝑚0 and 𝑎11 with object 𝑜1 in 𝑚1 , etc. Our task is to define the commutative diagram in Figure 3, so that 𝜎 (𝑎𝑡𝑖 ) is indeed a faithful representation of object 𝑜𝑖 in 𝑚𝑡 . The ∗ in the commutative diagram of Figure 3, which shows the relationship between subsymbolic and symbolic levels, is intentional because in general multiple translations may be necessary to achieve the right level of abstraction at the symbolic level. In addition, the dotted lines indicate that the transformations may not necessarily be precise (see Section 4). This potential need for multiple translations is especially important in the generalisation of transformations involving the same object through a sequence of matrices. Consider the problem of defining the concept of moving an object ‘up’ in a matrix and then generalising this notion to other objects. At the subsymbolic level, this operation transforms the region 𝑎01 (of 𝑚0 ) into 𝑎11 (of 𝑚1 ). We could give this transformation a symbol so that, e.g., the transformation 𝜏01 is represented by the atomic symbol 𝜎 (𝜏01 ), but we would not be able to describe (in symbolic terms) what this transformation entails, hence it will be impossible to generalise it or to reason about it in more abstract terms. This means 𝜏01 𝑎01 𝑎11 (subsymbolic level) ∗ ∗ ∗ 𝜎(𝑎01 ) 𝜎(𝑎11 ) (symbolic level) 𝜎(𝜏01 ) Figure 3: Relationship between subsymbolic and symbolic levels. that in general we want 𝜎 (𝑎01 ) not to be atomic in our symbolic language and, consequently, 𝜎 (𝜏01 ) should be expressed in terms of how it affects some specific properties of objects (in this case 𝑜1 ’s ‘location’). 𝜎 (𝜏01 ) must provide a faithful representation of 𝜏01 , meaning that the transformation from 𝜎 (𝑎01 ) into 𝜎 (𝑎11 ) must be such that 𝜎(𝑎11 ) indeed represents 𝑎11 , i.e., effectively how 𝑜1 would be identified/recognised in 𝑚1 (hence commuting the diagram of Figure 3). Because of the complexity of the operations involved, it may be sufficient for this process to simply be a “good enough” approximation. The next question is then what should the appropriate level of granularity of these representations be for our exercise? The answer to this question is not simple, but in principle, we want the granularity to reflect the level of reasoning we hope to be able to capture. As an example, suppose that we need the relative positioning of the “boundaries” of 𝑜1 within 𝑚0 and 𝑚1 (a reasonable assumption if we need to express the “movement” of objects). This means that we need to assume some coordinate/geometrical primitives which are available to us at the subsymbolic level, so that we can understand how to represent them symbolically. Thus, the identification process of 𝑜1 must include these elements yielding a signature for 𝑜1 that is not only sufficient for the recognition of 𝑜1 in different matrices, but that also contains the basic ingredients for the description of the properties of 𝑜1 that we need at the symbolic level. One such property could be, for example, 𝑜1 ’s relative position with respect to a common coordinate or to another area of interest (e.g., another object). Once we are happy with the basic components of 𝑜1 ’s signature we can associate it with 𝑎01 and our job will be complete if 𝜎 (𝜏01 ) is capable of producing a faithful symbolic representation of the signature 𝑎11 in the form of 𝜎 (𝑎11 ). We gloss over a number of complications for now, but it should be easy to understand that we can only describe concepts at the symbolic level that we can somehow capture at the subsymbolic one. In practice, this means that what we associate with 𝑜1 in Figure 1 at the subsymbolic level needs to capture enough of the relationship of 𝑜1 with the rest of 𝑚𝑖 so that we can describe it at the symbolic level in sufficient detail for the application in mind. 4. General Knowledge, Verification, Predictions and Revisions Consider the game of Pong mentioned in Section 1, which is represented as a sequence of matrices 𝑀 = 𝑚0 , 𝑚1 , … at the subsymbolic level. Some areas in the matrices are associated with objects, i.e., the ball and paddles. The objects may change in the sequence, e.g., by appearing in different locations in the matrices. These changes represent the notion of movement of the objects and can be used to describe how the scene evolves through time. We can perceive the changes between the matrices as transformations 𝜏0 , 𝜏1 , etc (see Figure 4). The challenge is to find a mechanism that describes the transformations in a way that is adequate for the application at hand. For example, we would like to describe 𝜏0 as a change in the relative 𝑦-coordinate of the left paddle and of the 𝑥-coordinate of the ball in 𝑚1 with respect to their values in 𝑚0 while still being able to associate the corresponding AoIs in 𝑚1 with the ‘same’ objects identified in 𝑚0 . Eventually, we want to produce a sequence of representations 𝜎 (𝑎0𝑖 ), 𝜎 (𝑎1𝑖 ), etc, that describe the objects in a symbolic language, leading to symbolic representations 𝜎 (𝑚0 ), 𝜎 (𝑚1 ), …, etc, of the matrices themselves. 𝑚0 𝑚1 𝑚2 𝑚3 𝑚4 𝜏0 𝜏1 𝜏2 𝜏3 𝑚0 −−→ 𝑚1 −−→ 𝑚2 −−→ 𝑚3 −−→ 𝑚4 Figure 4: Transformations across a sequence of images. To a large extent, we have only considered objects in isolation, but in general the subsymbolic repre- sentation may encode domain information about the objects’ relationships that is valuable for symbolic reasoning. In our example, information embedded in the matrices 𝑚𝑖 , such as the total number of objects, or their direction of travel with respect to each other, will not generally be captured by individual signatures of objects identified. An important consideration is therefore how to incorporate general knowledge Γ that we need to add to 𝜎 (𝑚𝑗 ) s.t. Γ ∪ 𝜎 (𝑚𝑗 ) is also faithful with respect to 𝑚𝑗 for the particular application. Note that this notion of general knowledge is related (but distinct) to that of common sense priors mentioned in [17]. The transformations 𝜏𝑖 should eventually allow us to perform basic predictions about what future matrices should look like, e.g., through simulating the application of transformations to generate the next state. Verification of the predictions against actual inputs should provide opportunities for revision that generate more accurate translations with time. This results in a temporal aspect of the whole process as well. This dimension is depicted along the horizontal axis of the diagram of Figure 5. 𝜏0 𝑚0 𝑚1 𝑚𝑖 (subsymbolic level) 𝜏03 𝑎03 𝑎13 𝜏02 𝑎02 𝑎12 𝑣𝑒𝑟𝑖𝑓 /𝑝𝑟𝑒𝑑 𝜏01 𝑎01 𝑎11 𝑣𝑒𝑟𝑖𝑓 /𝑝𝑟𝑒𝑑 𝑣𝑒𝑟𝑖𝑓 /𝑝𝑟𝑒𝑑 𝜎(𝜏03 ) 𝜎(𝑎03 ) 𝜎(𝑎13 ) 𝜎(𝜏02 ) 𝜎(𝑎02 ) 𝜎(𝑎12 ) 𝜎(𝜏01 ) 𝜎(𝑎01 ) 𝜎(𝑎11 ) 𝜎(𝜏0 ) 𝜎(𝑚0 ) ∪ Γ 𝜎(𝑚1 ) ∪ Γ 𝜎(𝑚𝑖 ) ∪ Γ (symbolic level) Figure 5: Relationship between subsymbolic and symbolic levels and successive matrices. 5. Conclusions In this paper, we proposed the conceptualisation of a multi-layered framework for neuro-symbolic integration with learning. Although we have not provided a concrete instantiation, this level of separation between symbolic and sub-symbolic reasoning, and the proposed mode of integration, is novel and generalises other approaches by providing an initial scaffold upon which to employ specific techniques. The initial focus is on how to define the basic ingredients with which a generic neuro-symbolic process can be devised that allows the association of subsymbolic components to symbolic counterparts, in a way that avoids hand-crafted symbols. We envisage the definition of object ‘signatures’ associating areas of interest in an image, the process that recognises them, and other components derived over time, and we postulate that comparisons between signatures can allow for the definition of some primitive concepts, such as properties, transformations, etc. More complex concepts can then be built from these in layers at increasing levels of abstraction, eventually leading to the development of a symbolic language over time arising from completely subsymbolic inputs. Finally, we see revisions of the model arising through comparisons of predictions of future states with actual inputs. In future work, we will investigate how interactions and relationships between objects can be captured symbolically using these ideas. Acknowledgments This work was supported by UK Research and Innovation grant number EP/S023356/1, in the UKRI Centre for Doctoral Training in Safe and Trusted Artificial Intelligence (https://www.safeandtrustedai.org). References [1] B. M. Lake, T. D. Ullman, J. B. Tenenbaum, S. J. Gershman, Building machines that learn and think like people, Behavioral and Brain Sciences 40 (2017) e253. doi:10.1017/S0140525X16001837 . [2] B. Goertzel, Perception processing for general intelligence: Bridging the Symbolic/Subsymbolic gap, in: J. Bach, B. Goertzel, M. Iklé (Eds.), Artificial general intelligence, volume 7716 of Lecture Notes in Computer Science, Springer Berlin Heidelberg, Berlin, Heidelberg, 2012, pp. 79–88. doi:10.1007/ 978- 3- 642- 35506- 6_9 . [3] M. Taddeo, L. Floridi, Solving the Symbol Grounding Problem: A Critical Review of Fifteen Years of Research, Journal of Experimental & Theoretical Artificial Intelligence 17 (2005) 419–445. doi:10. 1080/09528130500284053 . [4] A. Cangelosi, A. Greco, S. Harnad, Symbol Grounding and the Symbolic Theft Hypothesis, in: A. Cangelosi, D. Parisi (Eds.), Simulating the Evolution of Language, Springer, London, 2002, pp. 191–210. URL: https://doi.org/10.1007/978-1-4471-0663-0_9. doi:10.1007/978- 1- 4471- 0663- 0_9 . [5] L. Steels, The symbol grounding problem has been solved. So what’s next?, in: M. de Vega (Ed.), Symbols and embodiment: Debates on meaning and cognition, Oxford University Press, Oxford, 2008. [6] S. Bringsjord, The symbol grounding problem … remains unsolved, Journal of Experimental & Theoretical Artificial Intelligence 27 (2015) 63–72. doi:10.1080/0952813X.2014.940139 . [7] R. Cubek, W. Ertel, G. Palm, A Critical Review on the Symbol Grounding Problem as an Issue of Autonomous Agents, in: S. Hölldobler, R. Peñaloza, S. Rudolph (Eds.), KI 2015: Advances in Artificial Intelligence, volume 9324 of Lecture Notes in Computer Science, Springer International Publishing, Cham, 2015, pp. 256–263. doi:10.1007/978- 3- 319- 24489- 1_21 . [8] S. Coradeschi, A. Loutfi, B. Wrede, A Short Review of Symbol Grounding in Robotic and Intelligent Systems, KI - Künstliche Intelligenz 27 (2013) 129–136. doi:10.1007/s13218- 013- 0247- 2 . [9] K. Greff, S. van Steenkiste, J. Schmidhuber, On the Binding Problem in Artificial Neural Networks, arXiv (2020). URL: http://arxiv.org/abs/2012.05208. [10] T. R. Besold, K.-U. Kühnberger, A. S. d. Garcez, A. Saffiotti, M. H. Fischer, A. Bundy, Anchoring knowledge in interaction: Towards a harmonic Subsymbolic/Symbolic framework and architecture of computational cognition, in: J. Bieger, B. Goertzel, A. Potapov (Eds.), Artificial general intelligence, volume 9205 of Lecture Notes in Computer Science, 2015, pp. 35–45. doi:10.1007/978- 3- 319- 21365- 1_ 4. [11] T. R. Besold, A. S. d. Garcez, S. Bader, H. Bowman, P. M. Domingos, P. Hitzler, K.-U. Kühnberger, L. C. Lamb, D. Lowd, P. M. V. Lima, L. de Penning, G. Pinkas, H. Poon, G. Zaverucha, Neural-symbolic learning and reasoning: A survey and interpretation, arXiv (2017). URL: http://arxiv.org/abs/1711.03902. [12] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, Attention is all you need, in: I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (Eds.), Advances in Neural Information Processing Systems, volume 30, Curran Associates, Inc., 2017, pp. 5998–6008. [13] C. P. Burgess, L. Matthey, N. Watters, R. Kabra, I. Higgins, M. Botvinick, A. Lerchner, MONet: Unsupervised Scene Decomposition and Representation, arXiv (2019). URL: http://arxiv.org/abs/1901. 11390. [14] J. Mao, C. Gan, P. Kohli, J. B. Tenenbaum, J. Wu, The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision, in: 7th international conference on learning representations, ICLR 2019, 2019. URL: https://openreview.net/forum?id=rJgMlhRctm. [15] P. J. Blazek, M. M. Lin, A neural network model of perception and reasoning, arXiv (2020). URL: http://arxiv.org/abs/2002.11319. [16] L. A. A. Doumas, G. Puebla, A. E. Martin, J. E. Hummel, Relation learning in a neurocomputational architecture supports cross-domain transfer, in: S. Denison, M. Mack, Y. Xu, B. C. Armstrong (Eds.), Proceedings of the 42nd annual meeting of the cognitive science society, CogSci 2020, Montreal, 2020, pp. 932–937. URL: https://cogsci.mindmodeling.org/2020/papers/0165/index.html. [17] M. Garnelo, K. Arulkumaran, M. Shanahan, Towards Deep Symbolic Reinforcement Learning, arXiv (2016). URL: http://arxiv.org/abs/1609.05518.