=Paper=
{{Paper
|id=Vol-2287/paper12
|storemode=property
|title=Accounting for the Minimal Self and the Narrative Self: Robotics Experiments Using Predictive Coding
|pdfUrl=https://ceur-ws.org/Vol-2287/paper12.pdf
|volume=Vol-2287
|authors=Jun Tani
|dblpUrl=https://dblp.org/rec/conf/aaaiss/Tani19
}}
==Accounting for the Minimal Self and the Narrative Self: Robotics Experiments Using Predictive Coding==
<pdf width="1500px">https://ceur-ws.org/Vol-2287/paper12.pdf</pdf>
<pre>
    Accounting for the Minimal Self and the Narrative Self:
        Robotics Experiments Using Predictive Coding

                                           Jun Tani
                          1
                              Cognitive Neurorobotics Research Unit
                     Okinawa Institute of Science and Technology
                        Onna-son, Okinawa, Japan 904-0495
                                        jun.tani@oist.jp


       Abstract. This paper proposes that the mind comprises emergent phenomena
       that appear via intricate and often conflicting interactions between top-down in-
       tentional processes involved in proactively acting on the external world, and
       bottom-up recognition processes involved in inferring possible causes for the
       resultant perceptual reality. This view has been tested via a series of neuroro-
       botics experiments employing predictive coding principles implemented in
       “deep” recurrent neural network (RNN) models. The current paper illuminates
       phenomenological accounts of the minimal self and the narrative self from the
       analysis of those synthetic neurorobotics experiments.

       Keywords: Predictive Coding, Robot, RNN, Self, Consciousness.


1      Introduction

If we suppose that entangling interactions between top-down intention and bottom-up
recognition of perceptual reality from the objective world is essential in development
of embodied minds, what sorts of models could best account for such dynamic inter-
actions? The developmental psychologists Gibson and Pick [1] suggest that learning
an action is not just about learning a motor command sequence, but that it also in-
volves learning possible perceptual structures extracted during intentional interactions
with the environment. This view can be explained by predictive coding [2].
   In predictive coding frameworks, the learning process involves extracting causal
structure between the intention for action and resultant perceptual reality, by means of
prediction error minimization. It has been suggested that brains utilize specific macro-
scopic constraints, such as timescale differences and connectivity among local regions
in terms of downward causation for development of composition/decomposition
mechanisms [3]; therefore, it is expected that the predictive coding framework im-
plemented in neural network models provided with such spatio-temporal constraints
could extract hidden causal structure in a compositional manner from accumulated
sensory-motor experiences [4]. Our group conducted neurorobotics experiments fol-
lowing the aforementioned considerations [5-7]. The current review of these robotics
experiments suggests that the phenomenology of self and consciousness can be best
2


explained by self-organizing phenomena that emerge through intricate interaction
between top-down intentional processes and bottom-up perceptual reality.


2      Neurorobotics experiments

Yamashita and Tani [8] proposed a predictive coding recurrent neural network model,
known as the multiple timescale RNN (MTRNN), which is composed of stacks of
continuous-time RNNs (CTRNN) with different timescales assigned to each. This
model has been extended to a dynamic visual processing predictive coding model,
characterized by its multiscale property, both in temporal and spatial dimensions in
terms of its range of local connectivity [6]. Furthermore, Hwang et al. [7] integrated
this visual processing predictive coding model and MTRNN for the purpose of con-
ducting human-robot interaction experiments. The following provides a review of this
study.


2.1    P-VMDNN Model
Fig. 1 illustrates the integrated model, the predictive visuo-motor dynamic neural
network (P-VMDNN) [7], or experiment setup using a simulated humanoid robot and
part of the information flow. P-VMDNN consists of the visual pathway, the proprio-
ceptive (movement) pathway, and the associative layers. Each pathway consists of
stacks of CTRNNs, where CTRNNs in the lower layer comprise neural units with
smaller time constants, while those in higher layers have larger time constants. In
addition to this timescale constraint, the visual pathway also utilizes spatial-scale
constraint in terms of the connectivity range. As in the convolutional neural network,
retinotopically allocated neural units are connected only locally in the lower layers,
while those units in higher layers are connected fully in the same layer. The associa-
tive layers also comprise CTRNNs with larger time constants for each unit.
   The whole network functions as a generative model. The current internal state in
the highest associative layer, which encodes the current intention, dynamically drives
internal states in the next layer through top-down connectivity. The drive propagates
downward through layers to both the visual and proprioceptive pathways, and finally
generates a prediction of the pixel pattern and joint angles of next time step in the
lowest layers in the visual pathway and the proprioceptive pathway, respectively.
   Training of the P-VMDNN is conducted in an end-to-end supervised manner. Dur-
ing the robot tutoring phase, a set of visuo-proprioceptive sequences consisting of
video frame sequence and arm joint trajectories is sampled. This training enables the
network to regenerate exemplar sequence patterns by adapting connectivity weights
from the whole network. This is performed using a backpropagation-through-time
(BPTT) algorithm [9]. Actual training of multiple sequence patterns uses initial sensi-
tivity characteristics of nonlinear dynamics of the network. More specifically, training
by BPTT infers optimal values for initial internal states in all layers, for each se-
quence, as well as all connectivity weights. After training converges, each training
sequence can be regenerated with top-down prediction by setting values of the initial
                                                                                           3


internal states to those inferred for this sequence through training. By this means, it
can be said that initial states represent the intention for generating the corresponding
sequence. Then, recognition of a particular sequence pattern after learning can be
conducted by inferring the corresponding intention in terms of initial states that can
reconstruct the sequence pattern while preserving connectivity weights as fixed


 Fig. 1. P-VMDNN model (a), a simulated iCub imitates visually perceived human movement
    patterns (b), and the underlying mechanism of compositional movement generation (c).


2.2    Learning to imitate compositional movement patterns
   In the current task of imitation learning shown in Fig. 1a, tutoring of the simulated
robot was conducted as follows. Three human subjects volunteered to play the role of
imitatees. Volunteers were instructed in 9 different cyclic two-arm movement patterns
as movement primitives. Then, each subject generated 9 different ways of concatenat-
ing 3 movement primitives arbitrarily selected from 9 predefined movement primi-
tives. All 27 3-primitive concatenated patterns were demonstrated by each subject.
While the robot visually perceived each concatenated pattern demonstrated by the
subjects, a tutor guided the movement trajectory of both arms of the robot by syn-
chronously imitating the movement patterns of the subjects. In this manner, 27 visuo-
proprioceptive sequences were sampled for training the network. Network training
was conducted for 50,000 epochs. Fig. 2 shows some examples of generation of imi-
tated movement patterns after training.
4


      Fig. 2. Imitative generation of differently concatenated movement primitive sequences.

Three concatenated movements were generated as pantomime-like behaviors, while
generating visual imagery by providing corresponding initial internal state values
obtained via training. Each column shows an imitation of a movement pattern demon-
strated by a different subject, in which labels above indicate the subject ID and the 3-
primitive-concatenated pattern, with numbers indicating the corresponding primitive.
For each column, the top, middle, and bottom rows show neural activity after PCA in
the visual higher layer, visual lower layer, and proprioceptive lower layer, respective-
ly. Each concatenation of cyclic movement primitives can be seen in the lower visual
and proprioceptive layers. However, those sequences are abstractly represented with
more slowly changing profiles in the higher layer.
   Further analysis of network activities converged upon an understanding of the un-
derlying mechanism for generation of compositional movements (Fig. 1c). Slowly
changing profiles develop differently from different initial states in the higher layer
(as illustrated by “plan-1 and plan-2” (Fig.1c). On the other hand, in both lower layers
in the visual and proprioceptive pathways, a set of movement primitives is self-
organized as limit cycle attractors (three movement primitives illustrate (Fig. 1c, mid-
dle). Eventually, the higher layer manipulates the lower layers by feeding bifurcation
parameters that change slowly in a specific manner. This induces corresponding se-
quential transitions from one limit cycle to another. Consequently, movement patterns
can be generated in a compositional, yet fluid manner, using nonlinear dynamic char-
acteristics, including the initial sensitivity, bifurcation, and self-organization.

2.3      Online inference of intentions of others
This subsection describes an experiment involving online inference of intentions of
human subjects during synchronous imitation [6]. In this experiment, only the visual
pathway, consisting of 6 layers, was used. The network was trained to predict visually
                                                                                         5


perceived movement patterns, with 6 non-concatenated movement primitive patterns
generated by 5 subjects. After learning, the network task was to continue to predict
the next step visual image demonstrated by human subjects, while they occasionally
switched movement patterns from one of the trained patterns to another. For the pur-
pose of following such sudden switching, a scheme called online error regression [10]
was used. In this scheme, a certain step length for the previous window is allocated.
When a prediction error is generated, the error is back-propagated through time to the
onset of the previous window for the purpose of updating initial internal states in the
direction of error minimization. This corresponds to online inference of the intentions
of movement demonstrators.
   Fig. 3 shows an instance of online inference
where the target visual image and its prediction
after PCA, the prediction error, the internal state
in the higher and lower layers is shown from top
to bottom. The movement pattern in the target is
switched near step 290, which is accompanied by
a sharp rise in errors. However, error is reduced
within 10 steps, as prediction outputs start to
follow the switching. This recovery is accompa-
nied by a shift in internal states in the higher
layer. This immediate shift, enabled by means of
error minimization, accounts for possible mecha-
nisms for online inference of intentions of others,
as well as segmentation of the perceptual flow.

                                                Fig. 3. Online inference of intention.


3      Discussion: Phenomenology of self from synthesis

Here, I discuss possible correspondence of this synthetic robotics study to phenome-
nological understanding of self and consciousness. The second experiment showed
that shifts in intentions of others can be inferred by means of the error regression in
the previous window. When a robot’s own prediction went well, adequately synchro-
nizing with the other, everything went smoothly and automatically, where the distinc-
tion between self and other was submerged under coherently coupled dynamics.
However, when synchrony broke down due to a sudden intention change by the other,
the robot should become aware of the gap between the two. This should entail con-
sciousness, because the process of inferring the newly changed intentional state re-
quires considerable effort to minimize the prediction error. Here, self, as defined via
the gap, could consciously represent the minimal self [11, 12] in a pre-reflective form.
   Gallagher [11] accounts that the minimal self should entail the sense of agency by
considering possible neural mechanisms underlying the pathology of delusion of con-
trol in schizophrenia. Our prior study [13] on synthetic modeling of schizophrenia
shows that the proposed predictive coding model can support this account. It was
6


shown that mild perturbation in the connectivity from the higher level to sensory-
motor level possibly caused by this disease can generate fictive error and resultant
maladaptation of the intention state. This could generate the sense of fictive agency.
   Next, let us consider an account of the narrative self [11] in a reflective form. The
first experiment showed that the robot was able to extract compositional structure
from the experience of continuous perceptual flow by learning iterative interaction
with the other. This process includes segmentation of continuous visuo-proprioceptive
stream into a set of primitives, accompanied by minimal self-consciousness. These
primitives can be reused later by the higher layer to account for different experiences
in different contexts by recombining them. The experience re-represented in the high-
er layer in this manner is no longer a pure experience, but an objectified experience.
Finally, we see the development of a narrative self that can represent its own experi-
ence objectively, as episodes.


References
 1. Gibson, E.J., & Pick, A.D. (2000). An ecological approach to perceptual learning
    and development. New York: Oxford University Press.
 2. Rao, R. P., & Ballard, D. H. (1999). Predictive coding in the visual cortex: a functional in-
    terpretation of some extra-classical receptive-field effects. Nature neuroscience, 2(1), 79.
 3. Bassett, D. S., & Gazzaniga, M. S. (2011). Understanding complexity in the human brain.
    Trends in cognitive sciences, 15(5), 200– 209.
 4. Kiebel, S. J., Daunizeau, J., & Friston, K. J. (2008). A hierarchy of time-scales and the
    brain. PLoS computational biology, 4(11), e1000209.
 5. Tani, J. (2016). Exploring Robotic Minds: Actions, Symbols, and Consciousness as Self-
    Organizing Dynamic Phenomena. New York: Oxford University Press.
 6. Choi, M., & Tani, J. (2018). Predictive coding for dynamic visual processing: development
    of functional hierarchy in a multiple spatio-temporal scales RNN model. Neural Computa-
    tion, 30, 237–270.
 7. Hwang, J., Kim, J., Ahmadi, A., Choi, M., & Tani, J. (2018). Dealing with large-scale spa-
    tio-temporal patterns in imitative interaction between a robot and a human by using the
    predictive coding framework. IEEE Trans. on SMC: Systems, (99), 1-14.
 8. Yamashita, Y., & Tani, J. (2008). Emergence of functional hierarchy in a multiple time-
    scale neural network model: a humanoid robot experiment. PLoS Computational Biology,
    Vol.4, Issue.11, e1000220.
 9. Rumelhart, D.E., Hinton, G.E., & Williams, R.J. (1986). Learning internal representations
    by error propagation. Parallel distributed processing: Explorations in the microstructure
    of cognition. Cambridge, MA: MIT Press.
10. Tani, J., Ito, M., & Sugita, Y. (2004). Self-organization of distributedly represented multi-
    ple behavior schemata in a mirror system: reviews of robot experiments using RNNPB.
    Neural Networks, 17(8-9), 1273-1289.
11. Gallagher, S. (2000). Philosophical conceptions of the self: Implications for cognitive sci-
    ence. Trends in Cognitive Sciences, 4(1), 14-21.
12. Tani, J. (1998). An interpretation of the ‘self’ from the dynamical systems perspective: A
    constructivist approach. Journal of Consciousness Studies, 5(5-6), 516-542.
13. Yamashita, Y., & Tani, J. (2012). Spontaneous prediction error generation in schizophre-
    nia. PLoS One, 7(5), e37843.

</pre>