From Working Memory to Cognitive Control: Presenting a Model for their Integration in a Bio-inspired Architecture

Abstract-The prefrontal cortex (PFC) in the brain is considered as the main responsible of cognitive processes. This brain area is adjacent to the sensorial and motor cortices, and most importantly, gets innervated by dopamine, the neurotransmitter associated to pleasure and reward. This setting allows neuronal ensembles belonging to the PFC to form associations between sensory cues, actions and reward, which is exactly what is needed for a control mechanism to emerge. In order to allow cognitive control, an agent must be able to both perceive and form associations between the perceived inputs and the available actions. These associations will form the experience of an individual, thus shaping his behaviour. A fundamental process supporting cognition is offered by the working memory (WM), that is a small, short-term memory containing and protecting from interference goal-relevant pieces of information. The WM exploits the dopamine activity for two functions: as a gating signal, which determines when useful information can enter, and as a learning function, which allows the memory to learn whether the currently stored information is good or not with respect to a certain situation and the undergoing task. Grounding our work on biological and neuroscientific studies, we extend our Intentional Distributed Robotic Architecture (IDRA) 1 with a more powerful model of the memory, in particular exploiting the capabilities of the WM. IDRA is a bioinspired modular intentional architecture shaped and acting as the amygdala-thalamo-cortical circuit in the human brain; the architecture deals mainly with two tasks, which are the storage of representations of the current situation in a way similar to what the visual cortex does, and the autonomous generation of goals, starting from a set of hard-coded instincts. Yet, IDRA relies on an external Reinforcement Learning (RL) agent to perform actions, but most important it lacks of a task-driven memory system. We defined a new IDRA core module, which is called Deliberative Module (DM), with the addition of a model of the WM. The DM can act as both WM storage and actions generator, thanks to the introduction of a powerful chunk selection mechanism. A chunk is an object containing arbitrary information that competes for retention in an active memory storage. Transforming the problem of selecting actions to that of retaining chunks, we are able to exploit the same exact mechanism for both retention of chunks and generation of actions, consequently dropping out the RL agent previously in 1 A. M. Franchi, F. Mutti, G. Gini, "From learning to new goal generation in a bioinspired robotic setup", Advanced Robotics, 2016, DOI 10.1080/01691864.2016.1172732 charge of generating the actions. During each agent-environment interaction, the WM receives from sensors and inner processes the current state and a set of chunks of information proposed for retention. Its task is to select the best possible combination of them to maximize the future reward, estimated through a linear function approximator. The number of chunks that can be maintained in WM is small, 7 at maximum. Our WM model is composed by two modules, the first devoted to perception ad the second to choice. It receives in input the set of possible chunks, and outputs the content of the active memory, i.e., those chunks that are to be retained in memory. The perception stage builds a description of the currently perceived situation to obtain a sparse vector representing the state of the system it terms of percepts. The action selection selects the percepts to be kept as the WM content. This process is a form of context-sensitive learning as percepts are selected depending on both the current state and the context. The perception process is a cascade of feature extraction and clustering aimed at classifying the current input in an unsupervised fashion, obtaining their corresponding percepts. It first applies Principal Components Analysis (PCA) to reduce the dimensionality of the problem, then Independent Components Analysis (ICA) to extract the independent components, and finally K-Means to cluster data in the IC space. In this way the raw input is transformed into a set of perceivable classes represented in sparse coding. The active memory stage has to discard the percepts less useful keeping into consideration the limited capacity of its memory. After training, the experience is codified as "rules" determining the module's retention policy. We tested the WM model with available datasets to check whether the perception phase is able or not to create optimal features and clusters with respect to the input data, which can be produced by very heterogeneous sources. We compared our pipeline of sensor processing composed by PCA, ICA, and Softmax with the baseline being only Softmax on a heterogeneous dataset for classification, containing about 1500 entries coming from different sources (UCI repository), with nine classes. The result tells us that our pipeline outperforms the baseline, which is not able to distinguish at all some of the classes. In particular the addition of ICA is fundamental for dealing with heterogeneous data. Other experiments more relevant for robotics have been executed as well, demonstrating a good performance. Nevertheless, improvements are under way to integrate imitation learning in order to speed up the learning process.

Proceedings of EUCognition 2016 -"Cognitive Robot Architectures" -CEUR-WS