From Working Memory to Cognitive Control: Presenting a Model for their Integration in a Bio-inspired Architecture Michele Persiani, Alessio Mauro Franchi, Giuseppina Gini DEIB, Politecnico di Milano Milano, Italy michele.persiani@mail.polimi.it, alessiomauro.franchi@polimi.it, giuseppina.gini@polimi.it Abstract—The prefrontal cortex (PFC) in the brain is charge of generating the actions. During each agent-environment considered as the main responsible of cognitive processes. This interaction, the WM receives from sensors and inner processes brain area is adjacent to the sensorial and motor cortices, and the current state and a set of chunks of information proposed for most importantly, gets innervated by dopamine, the retention. Its task is to select the best possible combination of neurotransmitter associated to pleasure and reward. This setting them to maximize the future reward, estimated through a linear allows neuronal ensembles belonging to the PFC to form function approximator. The number of chunks that can be associations between sensory cues, actions and reward, which is maintained in WM is small, 7 at maximum. Our WM model is exactly what is needed for a control mechanism to emerge. In composed by two modules, the first devoted to perception ad the order to allow cognitive control, an agent must be able to both second to choice. It receives in input the set of possible chunks, perceive and form associations between the perceived inputs and and outputs the content of the active memory, i.e., those chunks the available actions. These associations will form the experience that are to be retained in memory. The perception stage builds a of an individual, thus shaping his behaviour. A fundamental description of the currently perceived situation to obtain a sparse process supporting cognition is offered by the working memory vector representing the state of the system it terms of percepts. (WM), that is a small, short-term memory containing and The action selection selects the percepts to be kept as the WM protecting from interference goal-relevant pieces of information. content. This process is a form of context-sensitive learning as The WM exploits the dopamine activity for two functions: as a percepts are selected depending on both the current state and the gating signal, which determines when useful information can context. The perception process is a cascade of feature extraction enter, and as a learning function, which allows the memory to and clustering aimed at classifying the current input in an learn whether the currently stored information is good or not unsupervised fashion, obtaining their corresponding percepts. It with respect to a certain situation and the undergoing task. first applies Principal Components Analysis (PCA) to reduce the Grounding our work on biological and neuroscientific studies, we dimensionality of the problem, then Independent Components extend our Intentional Distributed Robotic Architecture Analysis (ICA) to extract the independent components, and (IDRA) 1 with a more powerful model of the memory, in finally K-Means to cluster data in the IC space. In this way the particular exploiting the capabilities of the WM. IDRA is a raw input is transformed into a set of perceivable classes bioinspired modular intentional architecture shaped and acting represented in sparse coding. The active memory stage has to as the amygdala-thalamo-cortical circuit in the human brain; the discard the percepts less useful keeping into consideration the architecture deals mainly with two tasks, which are the storage of limited capacity of its memory. After training, the experience is representations of the current situation in a way similar to what codified as “rules” determining the module's retention policy. We the visual cortex does, and the autonomous generation of goals, tested the WM model with available datasets to check whether starting from a set of hard-coded instincts. Yet, IDRA relies on the perception phase is able or not to create optimal features and an external Reinforcement Learning (RL) agent to perform clusters with respect to the input data, which can be produced by actions, but most important it lacks of a task-driven memory very heterogeneous sources. We compared our pipeline of sensor system. We defined a new IDRA core module, which is called processing composed by PCA, ICA, and Softmax with the Deliberative Module (DM), with the addition of a model of the baseline being only Softmax on a heterogeneous dataset for WM. The DM can act as both WM storage and actions classification, containing about 1500 entries coming from generator, thanks to the introduction of a powerful chunk different sources (UCI repository), with nine classes. The result selection mechanism. A chunk is an object containing arbitrary tells us that our pipeline outperforms the baseline, which is not information that competes for retention in an active memory able to distinguish at all some of the classes. In particular the storage. Transforming the problem of selecting actions to that of addition of ICA is fundamental for dealing with heterogeneous retaining chunks, we are able to exploit the same exact data. Other experiments more relevant for robotics have been mechanism for both retention of chunks and generation of executed as well, demonstrating a good performance. actions, consequently dropping out the RL agent previously in Nevertheless, improvements are under way to integrate imitation learning in order to speed up the learning process. 1 A. M. Franchi, F. Mutti, G. Gini, “From learning to new goal generation in a bioinspired robotic setup”, Advanced Robotics, 2016, DOI 10.1080/01691864.2016.1172732 Proceedings of EUCognition 2016 - "Cognitive Robot Architectures" - CEUR-WS 67