Request confirmation networks for neuro-symbolic script execution Joscha Bach∗ Priska Herger Program for Evolutionary Dynamics micropsi industries GmbH Harvard University Berlin, Germany Cambridge, MA 02138 priska@micropsi-industries.com joscha@mit.edu Abstract We propose Request Confirmation Networks (ReCoNs) to combine neural learn- ing with the sequential, hierarchical detection of sensory features, and to facilitate planning and script execution. ReCoNs are spreading activation networks with units that contain an activation and a state, and are connected with typed directed links that indicate partonomic relations and spatial or temporal succession. By passing activation along the links, ReCoNs can perform both neural computations and controlled script execution. We demonstrate an application of ReCoNs in the cognitive architecture MicroPsi2 for an active perception task in a virtual environ- ment. 1 Introduction MicroPsi is a cognitive architecture that combines neuro-symbolic representations [1] with situated perception and a motivational system [2], [3]. MicroPsi agents live in an open environment that they have to explore and navigate; currently we are using the game environment Minecraft [4]. To this end, agents require mechanisms for bottom-up/top-down perception, reinforcement learning, motivation, decision making and action execution. We are using MicroPsi to study how to combine perceptual and conceptual representations, and to facilitate autonomous learning with full perceptual grounding. Cognitive architectures with perceptual grounding require a way to combine symbolic and sub- symbolic operations: planning, communication and reasoning rely on discrete, symbolic representa- tions, while fine-grained visual and motor interaction requires distributed representations. The stan- dard solution consists in a hybrid architecture that combines a neural network layer with a carefully crafted model that learns to compress the perceptual input, and a layer that facilitates deliberation and control using symbolic operations and uses the extracted regularities [5]. However, while such a dual architecture appears to be a straightforward solution from an engineering point of view, we believe that there is a continuum between perceptual and conceptual representations, i.e. that both should use the same set of representational mechanisms, and primarily differ in the operations that are performed upon them. In our view, symbolic/localist representations are best understood as a special case of subsymbolic/distributed representations, for instance ones where the weights of the connecting links are close to discrete values. Highly localist features often emerge in neural learn- ing, and rules expressed as discrete-valued links can be used to initialize a network for capturing more detailed, distributed features (see KBANN [6]). MicroPsi’s representations combine neural network principles and symbolic operations via recurrent spreading activation networks with directed links. The individual nodes in the network process ∗ http://micropsi.com 1 Copyright © 2015 for this paper by its authors. Copying permitted for private and academic purposes. information by summing up the activation that enters the nodes through slots, and calculating a function for each of their gates, which are the origin of links to other nodes. Node types differ by the number of gates and slots they have and by the functions and parameters of their gates. Typed links can be expressed by the type of their gate of origin. The most common node type in earlier MicroPsi implementations is called a concept node. Concept nodes possess seven gate types (with approximate semantics in parentheses): gen (associated), por (successor), ret (predecessor), sur (part-of), sub (has-part), exp (is-exemplar-of), cat (is-a). Concept nodes can be used to express hierarchical scripts [7] by linking sequences of events and actions using por/ret, and subsuming these sequences into hierarchies using sub/sur. In MicroPsi2, a perceptual representation amounts to a hierarchical script to test for the presence of the object in the environment. At each level of the hierarchy, the script contains disjunctions and subjunctions of substeps, which bottom out in distributed substeps and eventually in sensor nodes that reflect measurements in the environment and actuator nodes that will move the agent or its sensors. Recognizing an object requires the execution of this hierarchical script. In the past, this required a central executive that used a combination of explicit backtracking and activation propagation. We have replaced this mechanism with a completely distributed mode of execution called request confirmation network that only requires the propagation of activation along the links of connected nodes. 2 Request confirmation networks Modeling perception in a cognitive architecture requires an implementation of top-down/bottom-up processing, in which sensors delivers cues (bottom-up) to activate higher-level features, while per- ceptual hypotheses produce predictions of features that have to be verified by active sensing (top- down). When using a neural network paradigm, the combination of bottom-up and top-down pro- cessing requires recurrency. Request confirmation networks are a solution to combine constrained recurrency with the execution of action sequences. They provide a neural solution for the imple- mentation of sensorimotor scripts. Request Confirmation Networks (ReCoN) are implemented within MicroPsi2’s formalism but are in fact a more general paradigm for auto-executable networks of stateful nodes with typed links. We therefore provide a general definition that is independent of our implementation. A request confirmation network can be defined as a set of units U and edges E with U = {script nodes ∪ terminal nodes} E = {sub, sur, por, ret} A script node has a state s where s ∈ {inactive, requested, active, suppressed, waiting, true, confirmed, failed} and an activation a ∈ Rn , which can be used to store additional state. (In MicroPsi2, the activation is used to store the results of combining feature activations along the sub links). A terminal node performs a measurement and has a state s ∈ {inactive, active, confirmed}, and an activation a ∈ Rn , which represents the value obtained from the measurement. An edge is defined by hus , ut , type ∈ {por, ret, sub, sur}, w ∈ Rn i, whereby us and ut denote the source and target unit, por connects to a successor node, ret to a predecessor node, sub to a parent node, and sub connects to a child node. w is a link weight with n dimensions that can be used to perform additional computations. Each pair of nodes (us , ut ) is either unconnected or has exactly one pair of links of the types por/ret, or sub/sur. Each script node must have at least one link of type sub, i.e. at least one child that is either a script node or a terminal node. Script nodes can be the source and target of links of all types, whereas terminal nodes can only be targeted by links of type sub, and be the origin of links of type sur. ReCoNs solve the problem of constraining the flow of control in a hierarchical script on the level of individual units without using topological information, i.e. without a centralized interpreter or informing individual units about their neighboring elements. To do this, they need to store the exe- cution state within the script elements in a distributed manner; each of these units is a state machine with one of eight states (as defined above). Each unit implements connections to its neighbors using 2 directional links. Initially, all units are inactive. The execution of a script starts with sending the signal ‘request’ to its root node. Semantically, the root node represents a hypothesis that is spelled out in the script, and that we want to test; we request the validation of the root node. After the script is executed, it will either be in the state “confirmed” or “failed” (until we turn off the ‘request’ signal, and the unit becomes inactive again). Figure 1: Example of script execution. During the validation, it will need to validate all of its children, by sending a ‘request’ signal along its sub links. These may either form sequences or alternatives, cf. Figure 1. The former are validated in succession, the latter in parallel. Successive execution requires that units prevent their successors from becoming “active” until it is their turn; they do this by sending an ‘inhibit request’ signal along their por links. If a requested unit receives an ‘inhibit request’ signal, it becomes “suppressed” until its predecessor becomes “true” and turns off the signal. Active elements tell their parent that it should wait for their validation to finish by sending them a ‘wait’ message. Each active element will remain in the “waiting” state until either one of its children sends a ‘confirm’ message (in which case it turns into the state “true”, or no more children ask it to wait (in which case the element will turn into the state “failed”). In sequences, we also need to ensure that only the last element of a sequence can confirm the parent request, so each unit sends an ‘inhibit confirm’ signal via ret to its predecessors. The last unit in a sequence will not receive an ‘inhibit confirm’ signal, and can turn into the state “confirmed” to send the ‘confirm’ signal to the parent. The execution of the script can be reset or interrupted at any time, by removing the ‘request’ from its root node. The ReCoN can be used to execute a script with discrete activations, but it can also perform ad- ditional operations along the way. This is done by calculating additional activation values during the request and confirmation steps. During the confirmation step (a node turns into the state “con- firmed” or “true”), the activation of that node is calculated based on the activations of its children, and the weights of the sur links from these children. During the requesting step, children may re- ceive parameters from their parents which are calculated using the parent activation and the weights of the sub links from their parents. This mechanism can be used to adapt ReCoNs to a variety of associative classification and learning tasks. 2.1 A message passing definition of a ReCoN unit The units of a ReCoN implement a finite state machine – as drawn out in Figure 2. This can be realized by defining the eight discrete states explicitly and using a set of messages that are distributed along the por, ret, sub and sur links, depending on the current state of each unit. These messages can be defined as {inhibit request, inhibit confirm, wait, confirm, fail}. They are passed as specified in Table 1. In each state, incoming messages are evaluated and nodes either stay in the current state or switch to the next, if a condition is fulfilled. These conditions are given in Figure 2. 3 Unit state POR RET SUB SUR inactive (ø) – – – – requested (R) inhibit request inhibit confirm – wait active (A) inhibit request inhibit confirm request wait suppressed (S) inhibit request inhibit confirm – – waiting (W) inhibit request inhibit confirm request wait true (T) – inhibit confirm – – confirmed (C) – inhibit confirm – confirm failed (F) inhibit request inhibit confirm – – Table 1: Message passing in ReCoNs. Figure 2: State transitions. 2.2 A neural definition of a ReCoN unit It is possible to realize a ReCoN unit with an ensemble of artificial neurons. Figure 3 shows an arrangement of ten simple threshold elements with only excitatory and inhibitory links that satisfies these conditions. Let the activation of a neuron be defined as P (wij · αi ) if wij · αi ≥ 0 for all wij , αi αj = 0 otherwise i.e. any incoming activation on a link with a negative weight is sufficient to entirely inhibit the neuron. A message is any activation value sent over a link that crosses the boundary between units. Units may send the message ‘request’ to their children (top-down), ‘wait’ and ‘confirm’ to their parents (bottom-up), ‘inhibit request’ to their successors, and ‘inhibit confirm’ to their predecessors (lateral inhibition). 4 We want all activity in a unit to cease as soon as the unit is no longer requested. Thus, all activation in the unit is derived from the activation of the incoming request message. The request activation is directly sent to the neurons IC, IR, and W. IC will inhibit confirm signals by predecessor units, and IR will inhibit requests to child nodes by successor units, before the current unit is confirmed. W informs the parent node that it has active (non-failed) children, by sending the ‘wait’ message. IR also prevents the unit from confirming prematurely; i.e. before it has received a ‘confirm’ message by one of its children. IC then passes on its activation to R, and R sends out a request to the unit’s children, unless it is inhibited by the predecessor node. To give predecessor nodes enough time to send the inhibition signal, the flow of activation between IC and R is delayed by DR. F becomes active as soon as the unit has no more active children, and represents the failure state of the unit. Like all states, it must receive its activation from the initial request. It cannot be allowed to fail before the unit has sent a request to its children, so F is inhibited by the predecessors ‘inhibit request’ message at the neuron R . F can also not fail before the children cease sending their ‘wait’ message, so it is inhibited by that message. F must also allow enough time for requested children to respond with this message, so its activation is delayed by the helper neuron DF. Once F becomes active (i.e. the unit fails), it stops the ‘request’ to the children by inhibiting R, and the ‘wait’ to the parent, by inhibiting W. (F cannot get its activation directly from R, because F will later inhibit R and would thus remove its own activation; therefore it is activated through the helper neuron R ). The neuron T represents that the unit has been confirmed (true). It receives its activation from the original request, but is not allowed to become active through the inhibition by IC. Only if IC is turned off by the ‘confirm’ message of one of its children, T becomes active and will turn off W (the ‘wait’ message to the parent), R (the request of the children) and IR (the inhibition of the successors, and itself). If the unit has no successors (i.e. receives no ‘inhibit confirm’ message), it will signal T’s value via C as a ‘confirm’ message to the unit’s parent. This is just one of many possible ways in which a ReCoN unit could be realized with artificial neurons. Figure 3: Schematic of a neural ReCoN specification. In MicroPsi2, the functionality of this circuitry is embedded in a single node, which requires a only one calculation step and reduces memory usage; cf. section 3.1. 5 3 Implementation 3.1 A compact implementation of ReCoNs Rather than explicitly modeling the state machine of a ReCoN unit, this implementation makes use of the fact that the state of a ReCoN unit is fully determined by the activation of connected nodes in the previous time step. States are implemented as a set of simple arithmetic rules operating on incoming activation from connected nodes and the node itself. Nodes can link to themselves using “gen loops” to store the states “true”, “confirmed”, and “failed”. Activation as well as all link weights are numbers enabling real-value information processing and neural learning. In this implementation, por/ret activation ∈ {−1, 0, 1} is used to control the flow of sub requests ({0, 1}) and sur confirmation. The real-valued sur activations can be interpreted as probabilistic in the range [0, 1] or take on the value −1 to indicate a solid fail. They control por/ret activation in the next higher layer. Activation is spread in discrete steps. Each step consists of two phases: Propagation and calculation. Propagation is simply: z = W · a where W is a matrix of link weights and a is the activation vector. Calculation is fgate (fnode (z)), where fgate is an activation function specified per link type and fnode implements the ReCoN behavior. The following definitions describe the behavior of the node functions fnode per link type. if (z gen · z sub = 0) ∨ (∃ linkpor ∧ z por = 0)  sur gen z fnode = z gen · z sub otherwise if z sub ≤ 0 ∨ (∃ linkpor ∧ z por ≤ 0)  por 0 fnode = z sur + z gen otherwise if z por < 0  ret 1 fnode = 0 otherwise if z gen 6= 0 ∨ (∃ linkpor ∧ z por ≤ 0)  sub 0 fnode = z sub otherwise if z sub ≤ 0 ∨ (∃ linkpor ∧ z por ≤ 0)  0 sur fnode = (z sur + z gen ) · z ret if ∃ linkret  sur z + z gen otherwise An undesirable property of this implementation is that new activation is not solely based on incoming activation but also depends on local knowledge about the existence of por/ret links1 . 3.2 Using ReCoNs for an active perception task Active perception with ReCoNs is based on learned representations that are encoded in the structure and weights of links. Whenever the system encounters a new situation (for instance, after a locomo- tion action) it will form a model of its environment as a combination of verifiable hypotheses. At higher levels, these hypotheses can contain object representations and geometric relations between objects, at the lower levels features and eventually input patches that make up the visual structure of these objects. In the basic variant implemented so far, hypotheses are based on a single autoencoder that captures features of the scene as a whole. Hypotheses are verified by activating their top-most node with a request (i.e. sending activation to its sub slot). From there, activation spreads downwards through the sub-hypotheses defined by the node’s children, which are verified sequentially. If all existing hypotheses fail, the agent constructs a new one, scanning its visual field and connecting the localized feature nodes into a hierarchy. We 1 The source code is available from https://github.com/joschabach/micropsi2. Formulas found in the code are slightly more complex as they include additional features like time-based failing and searchability which are not relevant in this context. 6 built a MicroPsi agent that moves about in a Minecraft world along a transition graph, samples its visual environment and learns to recognize scenes on the graph with a request confirmation network. The agent does not process the entire visual field at once, but uses a rectangular fovea with a 16 × 16 sensor matrix. Using actuators, it can move this fovea to fixate a point (and thus a 16 × 16 patch) in its field of vision. The possible fixation points are constrained to allow for a 50 percent overlap between neighboring patches. These patches are learned using a denoising autoencoder [8] with zero-masking Bernoulli noise and a corruption level of 0.3. A 2-dimensional perspective projection of the Minecraft block world at a given position serves as input to the autoencoder. Figure 4b shows, for one such position, the visual field as the standard Minecraft client would present it to a human player (top), the projection provided as visual input to the agent (center), and a visualization of the features at that position as learned by the autoencoder (bottom). (a) Transition graph of the (b) Training data – Minecraft vanilla client (top), agent. visual input (center), learnt features (bottom). Figure 4: Action space and input sample of a MicroPsi2 agent in Minecraft The autoencoder hidden nodes play the role of terminal nodes to the ReCoN units. Features de- tected by the hidden nodes are fed into a localized feature grid, where each combination of feature and fixation point is expressed as a pair of por/ret connected ReCoN units. Each pair is sur/sub con- nected to a parent unit. The first node of the por/ret sequence is sub/sur connected to the actuator node that was active when the feature was detected, the second has an incoming sur link from one of the hidden nodes of the autoencoder (see Figure 5a). The grid as a whole associates expected features with all possible fixation points. Each grid element represents a feature at a given location in the agent’s visual field, together with the action required to reach that location. Since hypotheses are constructed as sequences of grid nodes, which contain actuators for moving the fovea, they are simple sensorimotor scripts. New hypotheses are formed based on those features that are deemed relevant to the current situation. Here, relevance is simply defined by an activation value exceeding a threshold of 0.8. Figure 5b shows the structure and connectivity of such a hypothesis. To evaluate the functionality of this implementation, we let the agent move around in the Minecraft world using a number of fixed locations. As expected the agent learned hypotheses for all locations, subsequently stopped forming new hypotheses, and successfully recognized all the locations. 7 gen gen top node of a hypothesis grid element sur sub sur sub por por por … fixation feature point ret ret ret sur sub sur from autoencoder to and from to and from to and from actuator grid element grid element grid element hidden node (a) An element of the feature grid. (b) Structure and connectivity of a hypothesis. Figure 5: Connectivity schemas used for scene recognition. 4 Summary and current work Request confirmation networks (ReCoNs) provide an elegant way to integrate hierarchical scripts into neuro-symbolic architectures. They combine top-down/bottom-up processing with sequences to form sensorimotor scripts. We gave a definition of a ReCoN unit as a message passing state machine and a second, neural definition using simple threshold elements. The implementation in our cognitive architecture MicroPsi2 uses dedicated nodes as ReCoN units. We tested our model using an autoencoder over visual input as input layer, and let our agent learn and recognize scenes in the game Minecraft. The agent also uses these representations for symbolic operations such as protocol formation and basic policy learning. ReCoNs do not imply a novel way of learning, but a representational formalism – which also limits the ways in which we can quantitatively evaluate them. We are currently working towards combining ReCoN units with long short-term memory [9] to learn the temporal/causal dynamics in Minecraft, for instance, to predict which hypothesis is the most likely candidate at any given time. We are also working on making hypotheses more robust against different types of variance: alternative views of a scene or object, presence /absence of features, and forming hierarchies of hypotheses. Acknowledgments We would like to thank Dominik Welland and Ronnie Vuine (micropsi industries) who are vitally involved in our recent and ongoing research and development of request confirmation networks. 8 References [1] Ioannis Hatzilygeroudis and Jim Prentzas. Neuro-symbolic approaches for knowledge representation in expert systems. International Journal of Hybrid Intelligent Systems, 1(3-4):111–126, 2004. [2] Joscha Bach. Principles of synthetic intelligence PSI: an architecture of motivated cognition, volume 4. Oxford University Press, 2009. [3] Joscha Bach. Modeling motivation in micropsi 2. In Artificial General Intelligence, pages 3–13. Springer, 2015. [4] Daniel Short. Teaching scientific concepts using a virtual world – minecraft. Teaching Science - Journal of the Australian Science Teachers Association, 58(3):55, 2012. [5] Stefan Wermter and Ron Sun, editors. Hybrid neural systems. Springer Science & Business Media, 2000. [6] Geoffrey G Towell and Jude W Shavlik. Knowledge-based artificial neural networks. Artificial intelli- gence, 70(1):119–165, 1994. [7] Roger C Schank and Robert P Abelson. Scripts, plans, and knowledge. Yale University, 1975. [8] Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. Extracting and compos- ing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning, pages 1096–1103. ACM, 2008. [9] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735– 1780, 1997. [10] Paul J Werbos. Generalization of backpropagation with application to a recurrent gas market model. Neural Networks, 1(4):339–356, 1988. [11] Ronald J Williams and David Zipser. Experimental analysis of the real-time recurrent learning algorithm. Connection Science, 1(1):87–111, 1989. 9