From Working Memory to Cognitive Control:
              Presenting a Model for their Integration in a
                       Bio-inspired Architecture
                                            Michele Persiani, Alessio Mauro Franchi, Giuseppina Gini
                                                          DEIB, Politecnico di Milano
                                                                  Milano, Italy
                                         michele.persiani@mail.polimi.it, alessiomauro.franchi@polimi.it,
                                                           giuseppina.gini@polimi.it

    Abstract—The prefrontal cortex (PFC) in the brain is                       charge of generating the actions. During each agent-environment
considered as the main responsible of cognitive processes. This                interaction, the WM receives from sensors and inner processes
brain area is adjacent to the sensorial and motor cortices, and                the current state and a set of chunks of information proposed for
most importantly, gets innervated by dopamine, the                             retention. Its task is to select the best possible combination of
neurotransmitter associated to pleasure and reward. This setting               them to maximize the future reward, estimated through a linear
allows neuronal ensembles belonging to the PFC to form                         function approximator. The number of chunks that can be
associations between sensory cues, actions and reward, which is                maintained in WM is small, 7 at maximum. Our WM model is
exactly what is needed for a control mechanism to emerge. In                   composed by two modules, the first devoted to perception ad the
order to allow cognitive control, an agent must be able to both                second to choice. It receives in input the set of possible chunks,
perceive and form associations between the perceived inputs and                and outputs the content of the active memory, i.e., those chunks
the available actions. These associations will form the experience             that are to be retained in memory. The perception stage builds a
of an individual, thus shaping his behaviour. A fundamental                    description of the currently perceived situation to obtain a sparse
process supporting cognition is offered by the working memory                  vector representing the state of the system it terms of percepts.
(WM), that is a small, short-term memory containing and                        The action selection selects the percepts to be kept as the WM
protecting from interference goal-relevant pieces of information.              content. This process is a form of context-sensitive learning as
The WM exploits the dopamine activity for two functions: as a                  percepts are selected depending on both the current state and the
gating signal, which determines when useful information can                    context. The perception process is a cascade of feature extraction
enter, and as a learning function, which allows the memory to                  and clustering aimed at classifying the current input in an
learn whether the currently stored information is good or not                  unsupervised fashion, obtaining their corresponding percepts. It
with respect to a certain situation and the undergoing task.                   first applies Principal Components Analysis (PCA) to reduce the
Grounding our work on biological and neuroscientific studies, we               dimensionality of the problem, then Independent Components
extend our Intentional Distributed Robotic Architecture                        Analysis (ICA) to extract the independent components, and
(IDRA) 1 with a more powerful model of the memory, in                          finally K-Means to cluster data in the IC space. In this way the
particular exploiting the capabilities of the WM. IDRA is a                    raw input is transformed into a set of perceivable classes
bioinspired modular intentional architecture shaped and acting                 represented in sparse coding. The active memory stage has to
as the amygdala-thalamo-cortical circuit in the human brain; the               discard the percepts less useful keeping into consideration the
architecture deals mainly with two tasks, which are the storage of             limited capacity of its memory. After training, the experience is
representations of the current situation in a way similar to what              codified as “rules” determining the module's retention policy. We
the visual cortex does, and the autonomous generation of goals,                tested the WM model with available datasets to check whether
starting from a set of hard-coded instincts. Yet, IDRA relies on               the perception phase is able or not to create optimal features and
an external Reinforcement Learning (RL) agent to perform                       clusters with respect to the input data, which can be produced by
actions, but most important it lacks of a task-driven memory                   very heterogeneous sources. We compared our pipeline of sensor
system. We defined a new IDRA core module, which is called                     processing composed by PCA, ICA, and Softmax with the
Deliberative Module (DM), with the addition of a model of the                  baseline being only Softmax on a heterogeneous dataset for
WM. The DM can act as both WM storage and actions                              classification, containing about 1500 entries coming from
generator, thanks to the introduction of a powerful chunk                      different sources (UCI repository), with nine classes. The result
selection mechanism. A chunk is an object containing arbitrary                 tells us that our pipeline outperforms the baseline, which is not
information that competes for retention in an active memory                    able to distinguish at all some of the classes. In particular the
storage. Transforming the problem of selecting actions to that of              addition of ICA is fundamental for dealing with heterogeneous
retaining chunks, we are able to exploit the same exact                        data. Other experiments more relevant for robotics have been
mechanism for both retention of chunks and generation of                       executed as well, demonstrating a good performance.
actions, consequently dropping out the RL agent previously in                  Nevertheless, improvements are under way to integrate imitation
                                                                               learning in order to speed up the learning process.

1
  A. M. Franchi, F. Mutti, G. Gini, “From learning to new goal generation in
a bioinspired robotic setup”, Advanced Robotics, 2016, DOI
10.1080/01691864.2016.1172732


        Proceedings of EUCognition 2016 - "Cognitive Robot Architectures" - CEUR-WS                                                     67