Inner Speech for a Self-Conscious Robot

               A. Pipitone1[0000−0003−2388−5887] , F. Lanza1[0000−0003−4382−6366] ,
            V. Seidita1,2[0000−0002−0601−6914] , and A. Chella1,2[0000−0002−8625−708X]
    1
            Dept. of Industrial and Digital Innovation (DIID), University of Palermo, Italy
        2
             C.N.R., Institute for High-Performance Computing and Networking (ICAR),
                                            Palermo, Italy


              Abstract. The experience self-conscious thinking in the verbose form
              of inner speech is a common one. Such a covert dialogue accompanies the
              introspection of mental life and fulfills important roles in our cognition,
              such as self-regulation, self-restructuring, and re-focusing on attentional
              resources. Although the functional underpinning and the phenomenology
              of inner speech are largely investigated in psychological and philosoph-
              ical fields, robotic research generally does not address such a form of
              self-conscious behavior. Existing models of inner speech inspire compu-
              tational tools to provide the robot with a form of self-consciousness. Here,
              the most widespread psychological models of inner speech are reviewed,
              and a robot architecture implementing such a capability is outlined.

              Keywords: Inner Speech · Cognitive Architecture · Robot Self-Consciousness
              · Robot Thought.


1           Introduction
Inner speech plays a central role in daily life. A person thinks over her mental
states, perspectives, emotions and external events by generating thoughts in
the form of linguistics sentences. Talking to herself enables the person to pay
attention to internal and external resources, to control and regulate her behavior,
to retrieve memorized facts, to learn and store new information and, in general,
to simplify otherwise demanding cognitive processes [1].
    Moreover, inner speech allows restructuring the perception of the external
world and the perception of self by enabling high-level cognition, including self-
control, self-attention, and self-regulation.
    Even if second-order thoughts may not need language but, for example, im-
ages or sensations, Bermudez [3], Jackendoff [5], among others, argue that gen-
uine conscious thoughts need language. In the light of the above considerations,
inner speech is an essential ingredient in the design of a self-conscious robot.
    We model such a necessary capability within a cognitive architecture for
robot self-consciousness by considering the underlying cognitive processes and
components of inner speech.
    It should be remarked that in the present paper such processes are taken
into account independently from the origin of the linguistics abilities which are
supposed acquired by the robot.
2       A. Pipitone et al.

   In Section 2 we show a brief overview of the cognitive models underlying
the proposed robot architecture, which is detailed in Section 3. Conclusions and
future works about the proposed robot architecture are discussed in Section 4.


2   Models of inner speech

Inner speech cannot be directly observed, thus reducing the scope for empir-
ical studies. However, theoretical perspectives were developed during the last
decades, and some of them are recognized in different research communities.
    Vygotsky [10] conceives inner speech as the outcome of a developmental
process during which the linguistics interactions, such as between a child and a
caregiver, are internalized. The linguistically mediated explanation for solving a
task thus becomes an internalized conversation with the self, when the learner
is engaged in the same o similar cognitive tasks.
    Morin [7][8] claims that inner speech is intrinsically linked to self-awareness.
Self-focusing on an internal resource triggers the inner speech, and then it gen-
erates self-awareness about such a resource. Typical sources for the self-focus
process are social interactions or mirror reflections by physical objects.
    Baddeley [2] discussed the roles of rehearsal and working memory, where
the different modules in the working memory are responsible for inner speech
rehearsal. In particular, the central executive oversees the process; the phono-
logical loop deals with spoken and written data, and the visuospatial sketchpad
deals with information in a visual or spatial form. The phonological loop is com-
posed of the phonological store for speech perception, which keeps information
in a speech-based form for a very short time (1-2 seconds), and of the artic-
ulatory control process for speech production, that rehearses and stores verbal
information from the phonological store.
    Inner speech is usually conceived as the back-propagation of produced sen-
tences to an inner ear: thus, a person rehears the internal voice she delivers.
Steels [9] argued that the language re-entrance allows refining the syntax emerg-
ing during linguistic interactions within a population of agents. The syntax thus
becomes more complex and complete by parsing previously produced utterances
by the same agent.
    In the same line, Clowes [4] discussed an artificial agent implemented by a
recurrent neural network whose output nodes are words interpreted as possible
actions (for example ‘up,’ ‘left,’ ‘right,’ ‘grab’). When such words are re-entrant
by back-propagating the output to the input nodes, then the agent achieved the
task in far fewer generations than in the control condition where words are not
re-entrant.


3   The cognitive architecture for inner speech

Figure 1 shows the proposed robot cognitive architecture for inner speech. Such
a representation refers to the Standard Model of Mind proposed by Laird et
                                    Inner Speech for a Self-Conscious Robot       3

al.[6]. Here, the structure and processing of the Standard Model are decomposed
with the aims to integrate the components and the processes defined by the inner
speech theories previously discussed.


            Fig. 1. The proposed cognitive architecture for inner speech.


3.1   Perception and Action

The perception of the proposed architecture includes the proprioception module
related to the self-perception of the emotions (Emo), the belief, desires and
intentions (BDI) and the robot body (Body), and the exteroception module
related to the perception of the outside environment.
    The proprioception module, according to Morin [7], is also stimulated by the
social milieu which, in the considered perspective, includes the social interactions
of the robot with the others entities in the environment, as physical objects like
the mirrors and the cameras and others robots or humans, by means face-to-face
interaction that foster self-world differentiation.
    The motor module is decomposed in three sub-components: the Action mod-
ule, the Covert Articulation module (CA) and the Self Action module (SA). In
particular:

 – The Action module represents the actions the agent performs on the outside
   world producing modifications to the external environment (not including
   the self) and the working memory.
 – The Covert Articulation (CA) module rehearses information from the Phono-
   logical Store (PS), i.e., the perceptual buffer for speech-based data considered
   as a sub-component of the short-term memory (see below). Such a module
4         A. Pipitone et al.

      acts as the inner voice heard by the phonological store by rounding informa-
      tion in a loop. In this way, the inner speech links the covert articulation to
      the phonological store in a round loop.
    – The Self Action (SA) module represents the actions that the agent performs
      on itself, i.e., self-regulation, self-focusing, and self-analysis.

3.2     The Memory System
The memory structure, inspired by the Standard Model of the Mind is divided
into three types of memories: the short-term memory (STM), the procedural
and the declarative long-term memory (LTM), and the working memory system
(WMS).
    The short-term memory holds sensory information on the environment in
which the robot is plunged that were previously coded and integrated with infor-
mation coming by perception. As previously mentioned, the short-term memory
includes the phonological store.
    Information flow from perception to STM allows storing the aforementioned
coded signals. In particular, information from perception to the phonological
store is related to conscious thoughts from exteroception, and to self-conscious
thoughts from proprioception.
    The information flow from the working memory system to perception pro-
vides expectations or possible hypotheses that are employed for influencing the
attention process. In particular, the flow from the phonological store to propri-
oception enables the self-focus modality.
    The long-term memory holds learned behaviors, semantic knowledge, and ex-
perience. In the considered case, the declarative LTM contains the linguistics in-
formation in terms of lexicon and grammatical structures, i.e., the LanguageLTM
memory. The declarative linguistics information is assumed acquired, as spec-
ified above, and represent the grammar of the robot. Moreover, the Episodic
Long-Term Memory (EBLTM) is the declarative long-term memory component
which communicates to the Episodic Buffer (EB) within the working memory
system, that acts as a ‘backup’ store of long-term memory data.
    The procedural LTM contains the composition rules according to which the
linguistic structures are arranged for producing sentences at different levels of
completeness and complexity. A procedure does not concern the grammatical
plausibility of the structures only. Other rules concerning the regulation, the
focusing and the restructuring of resources within the whole environment (in-
cluding the self) are to be considered.
    Finally, the working memory system holds task-specific information ‘chunks’
and streamlines them to the cognitive processes during the task execution, step
by step according to the cognitive cycle of the Standard Model of the Mind. The
working memory system deals with cognitive tasks such as mental arithmetic
and problem-solving. The Central Executive (CE) sub-component manages and
controls the linguistic information of the rehearsal loop by the integrating (i.e.,
combining) data from the phonological loop and also drawing on data held in
the long-term memory.
                                    Inner Speech for a Self-Conscious Robot       5

3.3   The Cognitive Cycle
In brief, a cognitive cycle starts with the perception that converts external sig-
nals in linguistics data and holds them into the phonological store. The central
executive manages the inner thinking process by enabling the working memory
system to selectively attend to some stimuli or ignore others, according to the
rules stored within the LTMs, and by orchestrating the phonological loop as a
slave system.
    At this stage, a conscious thought emerges as a result of a single round be-
tween the phonological store and the covert articulation triggered by the phono-
logical loop, once the central executive has retrieved the data for the process.
The phonological loop enables the covert articulation which acts as a motor for
the internal production, and whose output stream is heard to the phonologi-
cal store. The output stream also affects the self which is then regulated and
restructured.
    Once the conscious thought is elicited by inner speech, the perception of the
new context could take place, repeating the cognitive cycle.


4     Conclusions
In this paper, a cognitive architecture for inner speech cognition is presented. It
is based on the Standard Model of Mind which was decomposed for including
some typical components of the inner speech’s models for human beings.
    The working memory system of the architecture includes the phonological
loop considered by Baddeley as the main component for storing spoken and
written information and for implementing the cognitive rehearsal process.
    The covert dialogue is modeled as a loop in which the phonological store hears
the inner voice produced by the covert articulator process. The central executive
is the master system which drives the whole system.
    By retrieving linguistic information from the long-term memory, the cen-
tral executive contributes to creating the linguistic thought whose surface form
emerges by the phonological loop.


Acknowledgments
This material is based upon work supported by the Air Force Office of Scientific
Research under award number FA9550-17-1-0232.


References
1. Alderson-Day, B., Fernyhough, C.: Inner Speech: Development, Cognitive Functions,
   Phenomenology, and Neurobiology. Psychological Bulletin 141(5), 931–965 (2015)
2. Baddeley, A.: Working Memory. Science 255(5044), 556–559 (1992)
3. Bermudez, J.L.: The Paradox of Self-Consciousness. MIT Press, Cambridge, MA
   (1998).
6       A. Pipitone et al.

4. Clowes, R.: A Self-Regulation Model of Inner Speech and its Role in the Organisation
   of Human Conscious Experience. Journal of Consciousness Studies 14(7), 59–71
   (2007)
5. Jackendoff, R.: How Language Helps Us Think. Pragmatics & Cognition 4(1), 1–34
   (1996)
6. Laird, J.E., Lebiere, C., Rosenbloom, P.S.: A Standard Model of the Mind: To-
   ward a Common Computational Framework across Artificial Intelligence, Cognitive
   Science, Neuroscience, and Robotics. AI Magazine Winter 2017, 13–26 (2017)
7. Morin, A.: A Neurocognitive and Socioecological Model of Self-Awareness. Genetic,
   Social, and General Psychology Monographs 130(3), 197–222 (2004)
8. Morin, A.: Possible Links Between Self-Awareness and Inner Speech. Journal of
   Consciousness Studies 12(4-5), 115–134 (2005)
9. Steels, L.: Language Re-Entrance and the ’Inner Voice.’ Journal of Consciousness
   Studies 10(4-5), 173–185 (2003)
10. Vygotsky, L.: Thought and Language. Revised and expanded edition. MIT Press,
   Cambridge, MA (2012)