A Role for Action Selection in Consciousness: An
  Investigation of a Second-Order Darwinian Mind
                                          Robert H. Wortham and Joanna J. Bryson
                                        Dept of Computer Science, University of Bath
                                           Claverton Down, Bath, BA2 7AY, UK
                                         Email: {r.h.wortham, j.j.bryson}@bath.ac.uk


   Abstract—We investigate a small footprint cognitive archi-         From Ethology to Robots
tecture comprised of two reactive planner instances. The first
interacts with the world via sensor and behaviour interfaces.            Following in-depth studies of animals such as gulls in their
The second monitors the first, and dynamically adjusts its            natural environment, ideas of how animals perform action
plan in accordance with some predefined objective function.           selection were originally formulated by Nico Tinbergen and
We show that this configuration produces a Darwinian mind,            other early ethologists [6], [7]. Reactions are based on pre-
yet aware of its own operation and performance, and able to
maintain performance as the environment changes. We identify          determined drives and competences, but depend also on the
this architecture as a second-order Darwinian mind, and discuss       internal state of the organism [8]. Bryson [9] harnessed these
the philosophical implications for the study of consciousness. We     ideas to achieve a major step forwards with the POSH (Parallel
use the Instinct Robot World agent based modelling environment,       Ordered Slipstack Hierarchy) reactive planner and the BOD
which in turn uses the Instinct Planner for cognition.                (Behaviour Oriented Design) methodology, both of which are
                                                                      strongly biologically inspired. A POSH plan consists of a
  B IOLOGICALLY I NSPIRED C OGNITIVE A RCHITECTURES                   Drive Collection (DC) containing one or more Drives. Each
                                                                      Drive (D) has a priority and a releaser. When the Drive
   From the 1950’s through to the 1980’s the study of em-
                                                                      is released as a result of sensory input, a hierarchical plan
bodied AI assumed a cognitive symbolic planning model for
                                                                      of Competences, Action Patterns and Actions follows. POSH
robotic systems — SMPA (Sense Model Plan Act) — the most
                                                                      plans are authored, or designed, by humans alongside the
well known example of this being the Shakey robot project
                                                                      design of senses and behaviour modules. An iterative approach
[1]. In this model the world is first sensed and a model of the
                                                                      is defined within BOD for the design of intelligent artefacts —
world is constructed within the AI. Based on this model and
                                                                      these are known as agents, or if they are physically embodied,
the objectives of the AI, a plan is constructed to achieve the
                                                                      robots.
goals of the robot. Only then does the robot act. Although this
idea seemed logical and initially attractive, it was found to be
quite inadequate for complex, real world environments.
                                                                      Kinds of Minds
   In the 1990’s Rodney Brooks and others [2] introduced
the then radical idea that it was possible to have intelligence          Daniel Dennett[10] elegantly outlines a high level ontology
without representation [3]. Brooks developed his subsumption          for the kind of minds that exist in the natural world. At the
architecture as a pattern for the design of intelligent em-           most basic level, the Darwinian mind produces ‘hardwired’
bodied systems that have no internal representation of their          behaviours, or phenotypes, based on the genetic coding of
environment, and minimal internal state. These autonomous             the organism. The Skinnerian mind is plastic, and capable
agents could traverse difficult terrain on insect-like legs, appear   of ’ABC’ learning — Associationism, Behaviourism, Connec-
to interact socially with humans through shared attention             tionism. The Popperian mind runs simulations to predict the
and gaze tracking, and in many ways appeared to possess               effect of planned actions, anticipating experience. It therefore
behaviours similar to that observed in animals. However,              permits hypotheses “to die in our head” rather than requiring
the systems produced by Brooks and his colleagues could               them to be executed in the world before learning can take
only respond immediately to stimuli from the world. They              place. Finally the Gregorian mind (after the psychologist
had no means of focusing attention on a specific goal or              Richard Gregory) is able to import tools from the cultural
of executing complex sequences of actions to achieve more             environment, for example language and writing. Using these
complex behaviours. Biologically inspired approaches are still        tools enables the Gregorian mind, for example the human
favoured by many academics, although a wide gap exists                mind, to be self-reflective.
between existing implementations and the capabilities of the             However, perhaps the simple Darwinian mind might also
human mind [4]. Today, the argument persists concerning               be arranged to monitor itself, and in some small and limited
whether symbolic, sub-symbolic or hybrid approaches are best          sense to be aware of its own performance and act to correct
suited for the creation of powerful cognitive systems [5]. Here       it. Bryson suggests that consciousness might assist in action
we concern ourselves more specifically with action selection          selection [11], and here we investigate whether action selection
as a core component of any useful cognitive architecture.             achieved through reactive planning might parallel one of the


        Proceedings of EUCognition 2016 - "Cognitive Robot Architectures" - CEUR-WS                                        25
Fig. 1: Screen shot of the Instinct Robot World in operation. Each robot is represented as a single character within the display. Robots are labelled with
letters and numbers to distinguish them. When a robot’s monitor plan becomes active the robot representation changes to the shriek character (!). The top
right section of the screen is used to control the robots and the plans they use. The bottom right section displays statistics about the world as it runs.


commonly accepted characteristics of consciousness; that is to                future work where we may realise physical embodiment of
be self-reflective and regulating [12].                                       this architecture.
                                                                                 The Robot World allows many robots to be instantiated,
Instinct and the Robot World                                                  each with the same reactive plan, or with a variety of plans.
                                                                              The robots each have senses to sense the ‘walls’ of the
   The Instinct Planner [13] is a biologically inspired reactive
                                                                              environment, and other robots. The reactive plan invokes
planner specifically designed for low power processors and
                                                                              simple behaviours to move the robot, adjust its speed and
embedded real-time AI environments. Written in C++, it
                                                                              direction, or interact with robots that it encounters within the
runs efficiently on both A RDUINO and M ICROSOFT VC++
                                                                              world as it moves. Most importantly for this investigation, each
environments and has been deployed within the R5 low cost
                                                                              robot also has a second Instinct Planner instance. This planner
maker robot to study AI Transparency [14].
                                                                              monitors the first, and is able to modify its parameters based
   It’s unique features are its tiny memory footprint and
                                                                              on a predefined plan.
efficient operation, meaning that it can operate on a low
powered micro-controller environment such as A RDUINO.                           The Instinct Robot World provides statistical monitoring
Alternatively, as in this experiment, many planners can run                   to report on the overall activity of the robots within the
within one application on a laptop PC.                                        world. These include the average percentage of robots that are
   The Instinct Robot World is a new agent based modelling                    moving at any one time, the average number of time units
tool, shown in Figure 1. This is an open source project and all               (ticks) between robot interactions, and the average amount
code and configuration files are available online 1 . Each virtual            of time that the monitor planner intervenes to modify the
‘robot’ within the Robot World uses an Instinct Planner to                    robot plan. We use the Instinct Robot World to investigate the
provide action selection. Strictly, since these virtual robots are            idea of Reflective Reactive Planning — one reactive planner
not physically embodied, we should refer to them as agents.                   driving behaviour based on sensory input and predefined drives
However, we have chosen to use ‘robot’ throughout, as intu-                   and competences, and another reactive planner monitoring
itively these cognitive entities appear to be virtually embodied              performance and intervening to modify the predefined plan of
within the Robot World, and this choice of language seems                     the first, in accordance with some higher level objective. This
more natural. In the final section of this paper we discuss                   simple combination of two Darwinian minds, one monitoring
                                                                              the other, might also be considered to be a second-order
  1 http://www.robwortham.com/instinct-planner/                               Darwinian mind.


         Proceedings of EUCognition 2016 - "Cognitive Robot Architectures" - CEUR-WS                                                         26
 Reflective Reactive Planning - A 2nd Order Darwinian Mind

                                                                                                                 ROBOT
       #2                                     Plan
                                             Modifier
                                                                                                           Plan Manager
        Monitor Plan                                                        #1
                                            Plan State
       Reactive Planner                                                         Plan                                                Action
                                                                                                         Reactive
                                                                               Monitor                                             Selection
                                                                                                         Planner
                                           Plan model


                                                                                                          Internal
                                                                                                           Robot                        Behaviour
                                                                   Sensor model
                                                                                                            State                        Library


                                                                       WORLD
Fig. 2: Architecture of the second-order Darwinian mind. The robot is controlled by the Instinct Reactive Planner, as it interacts with the Sensor model and
Behaviour Library. In turn, a second instance of Instinct monitors the first, together with the Internal robot state, and dynamically modifies parameters within
the robot’s planner.The overall effect is a robot that not only reacts to its environment according to a predefined set of goals, but is also to modify that
interaction according to some performance measure calculated within the Plan model.


                             C ONJECTURES                                           While the robot is in the ‘Interacting’ state it is shown as
   We expect that second-order Darwinian minds will out-                         a shriek character (!) within the Robot World display. Once
perform first order minds when the environment changes,                          the robot has interacted its priority for interaction decreases,
because the monitor planner is concerned with achieving                          but ramps up over time. This may be likened to most natural
higher order objectives, and modifies the operation of the first                 drives, for example mating, feeding and the need for social
planner to improve its performance. We also hypothesise that                     interaction.
this architecture will remain stable over extended periods of
time, because by restricting ourselves to the reactive planning                     The Monitor Plan is designed to keep the robot exploring
paradigm we have reduced the number of degrees of freedom                        when it is overly diverted from social interactions. It achieves
within which the architecture must operate, and previous                         this by monitoring the time between interactions. If, over three
work shows that first-order minds produce reliable control                       interactions, the average time between interactions reduces
architectures [14]. Finally, we expect that such a second-order                  below 1000 ticks, then the Monitor Planner reduces the priority
system should be relatively simple to design, being modular,                     of the interaction Drive. After 1000 ticks the priority is reset to
well structured and conceptually straightforward.                                its original level. We might use alternative intentional language
                                                                                 here to say that the Monitor Planner ‘notices’ that the robot is
                                M ETHODS                                         being diverted by too many social interactions. It then reduces
  Figure 2 shows the Reflective Reactive Planning architecture                   the priority of those interactions, so that the robot is diverted
implemented within the Instinct Robot World, and controlling                     less frequently. After some time the Monitor Planner ceases
the behaviour of each robot within that world. The robot plan                    to intervene until it next notices this situation re-occurring.
has the following simple objectives, each implemented as an
Instinct Drive.                                                                    The Robot World is populated with varying numbers of
  • Move around in the environment so as to explore it.                          robots (2, 3, 5, 10, 20, 50, 100, 200, 500, 1000), and for each
  • Avoid objects i.e. the walls marked as ‘X’ in Figure 1.                      number the experiment is run twice, once with a monitor plan,
  • Interact when another robot is ‘encountered’ i.e. when                       and once without. For each run, the environment is allowed
     another robot is sensed as having the same coordinates                      to run for some time, typically about 10 minutes, until the
     within the grid of the Robot World. This interaction                        reported statistics have settled and are seen to be no longer
     causes the robot to stop for 200 clock cycles or ‘ticks’.                   changing over time.


         Proceedings of EUCognition 2016 - "Cognitive Robot Architectures" - CEUR-WS                                                               27
                                      O UTCOMES                                                              Percentage of Time that Monitor Activated
  The results are most elegantly and succinctly presented as                             60.00%


simple graphs. Firstly, the average number of robots moving at
                                                                                         50.00%
any one time within the world is shown in Figure 3. In both
                                                                                         40.00%


                               Robots Moving in the World
                                                                                         30.00%
 100.00%

  90.00%
                                                                                         20.00%
  80.00%

  70.00%
                                                                                         10.00%
  60.00%

  50.00%                                                                                  0.00%
                                                                                                  1                  10                        100              1000
  40.00%
                                                                                                                                Robots
  30.00%

  20.00%
                                                                                        Fig. 5: This graph shows the average percentage number of robots whose
  10.00%                                                                                monitor plan is activated at any one time, for a given number of total robots
   0.00%                                                                                in the world. Note the log scale.
            1                    10                                        100   1000
                                                    Robots

                                      No Monitor            With Monitor

                                                                                           The Instinct Robot World was found to be a stable, reliable
Fig. 3: This graph shows the average percentage number of robots that are               platform for our experiments, and the results it achieved were
moving at any one time within the world, for a given total number of robots
in the world. It can be seen that the addition of the monitor plan maintains
                                                                                        repeatable. The application is single threaded, and so uses only
more robots moving as the number of robots increases. Note the log scale for            one core of the CPU on the laptop PC on which it was run.
robots in world.                                                                        Nevertheless, it was possible to simulate 1000 robots with both
                                                                                        reactive planners active operating in the world at the rate of
cases, as the number of robots within the world increases,                              70 clock cycles (ticks) per second.
the amount of time that the robot spends moving reduces.
However the Monitor Planner acts to reduce the extent of
                                                                                                                          D ISCUSSION
this reduction from 60% to less than 20% over the full range
of two to a thousand robots within the world. Similarly, in                                From the results we can see that by using a second Instinct
Figure 4 we see that as more robots are introduced into the                             instance to monitor the first, we can achieve real-time learning
world, the average time between interactions naturally reduces.                         within a tiny-footprint yet nevertheless symbolic cognitive
However, the action of the Monitor Planner progressively                                architecture. In addition, since this learning modifies param-
limits this reduction, so that with 1000 robots the time between                        eters from a human designed plan, the learning can be well
interactions is almost trebled, from 310 to 885 ticks per                               understood and is transparent in nature. This contrasts strongly
interaction. Interestingly, in both these graphs we see smooth                          with machine learning approaches such as neural networks
curves both with and without the action of the monitor plan.                            that typically learn offline, are opaque, and require a much
                                                                                        larger memory workspace. Despite the stochastic nature of
                            Average Time Between Interactions
                                                                                        the environment, the performance graphs show smooth curves
 4000                                                                                   over a wide range of robot populations.
 3500                                                                                      This relatively simple experiment also provides further fuel
 3000
                                                                                        for the fire concerning the philosophical discussion of the
                                                                                        nature of consciousness. Critics may say that when we use
 2500
                                                                                        the intentional stance [15] to describe the behaviour of the
 2000
                                                                                        Monitor Planner as ‘noticing’ something, we are merely using
 1500
                                                                                        metaphor. They might argue that there is in fact no sentience
 1000                                                                                   doing any noticing, and in fact the only ‘noticing’ that is
  500                                                                                   happening here is us noticing the behaviour of this human
    0                                                                                   designed mechanism, which itself is operating quite without
        1                      10                                          100   1000
                                                   Robots
                                                                                        any sentience and certainly without being conscious [16].
                                      No Monitor            With Monitor
                                                                                        But that is to miss the point. We are not claiming that this
                                                                                        architecture is conscious in the human or even significant
Fig. 4: This graph shows the average time between robot interactions, both              sense of the word, merely that our architecture is inspired
with and without the monitor plan. The addition of the monitor plan reduces
the variance in interaction time as robot numbers vary. Again, note the log
                                                                                        by one aspect of how biological consciousness appears to
scale.                                                                                  operate. However, having shown that this architecture can
                                                                                        indeed provide adaptive control, and drawing on the knowl-
  The final graph, Figure 5 also shows a smooth, sigmoid like                           edge that gene expression produces behaviours which can
increase in activation of the Monitor Planner as the number                             be modelled using reactive planning, we might also consider
of robots increases, plotted on a logarithmic scale.                                    whether consciousness in animals and humans may indeed


                Proceedings of EUCognition 2016 - "Cognitive Robot Architectures" - CEUR-WS                                                              28
arise from complex hierarchical mechanisms. These mecha-             planner. This architecture, which we call Reflective Reactive
nisms are biologically pre-determined by genetics, and yet in        Planning, successfully controls the behaviour of a virtual robot
combination yield flexible, adaptive systems able to respond         within a simulated world, according to pre-defined goals and
to changing environments and optimise for objective functions        higher level objectives. We have shown how this architecture
unrelated to the immediate competences of preprogrammed              may provide both practical cognitive implementations, and
behavioural responses. This is not to argue for some kind of         inform philosophical discussion on the nature and purpose of
emergence [17], spooky or otherwise, but more simply to add          consciousness.
weight to the idea that the ‘I’ in consciousness is nothing more        The Instinct Robot World is an entirely open source plat-
than an internal introspective narrative, and such a narrative       form, available online. We welcome those interested in agent
may be generated by using hierarchical mechanisms that notice        based modelling, cognitive architectures generally, and reac-
one another’s internal states, decision processes and progress       tive planning specifically, to investigate these technologies
towards pre-defined (phenotypic) objectives.                         and offer suggestions for new applications and further work.
   We could certainly envisage a much grander architecture,          One possibility might be to apply this architecture to the
assembled at the level of reactive planners, using maybe             Small Loop Problem [20], a specific challenge for biologically
hundreds or thousands of planners each concerned with certain        inspired cognitive architectures.
objectives. Many of these planners may be homeostatic in                We continue to develop robot applications for the Instinct
nature, whilst others would be concerned with the achievement        Planner, together with the Instinct Robot World. We are inves-
of higher level objectives. We must remember that planners           tigating the use of a small robot swarm to build a physically
merely coordinate action selection, and say nothing about how        embodied version of this experiment. To this end, we are
sensor models may be formed, nor how complex behaviours              currently working with the University of Manchester’s Mona
themselves may be implemented. However, all dynamic archi-           robot2 .
tectures need some kind of decision centric ‘glue’ to bind them
together, and reactive planning seems to be a useful candidate                                      R EFERENCES
here, as evidenced by practical experiment and biological
                                                                      [1] N. J. Nilsson, “Shakey the Robot,” SRI International, Technical Note
underpinning.                                                             323, Tech. Rep., 1984.
   Machine transparency is a core element of our research. We         [2] C. Breazeal and B. Scassellati, “Robots that imitate humans,” Trends in
have shown elsewhere [14] that reactive planners, particularly            Cognitive Sciences, vol. 6, no. 11, pp. 481–487, 2002.
the Instinct Planner, are able to facilitate transparency. This is    [3] R. A. Brooks, “Intelligence Without Representation,” Artificial Intelli-
                                                                          gence, vol. 47, no. 1, pp. 139–159, 1991.
due to the human design of their plans, and the ability to gather     [4] A. V. Samsonovich, “Extending cognitive architectures,” Advances in
meaningful symbolic information about internal system state               Intelligent Systems and Computing, vol. 196 AISC, pp. 41–49, 2013.
and decision processes in real-time as the planner operates.          [5] A. Lieto, A. Chella, and M. Frixione, “Conceptual Spaces for Cognitive
                                                                          Architectures : A lingua franca for different levels of representation,”
This ability to inspect the operation of the architecture may             Biologically Inspired Cognitive Architectures, no. November, pp. 1–9,
assist designers in achieving larger scale cognitive imple-               2016. [Online]. Available: http://dx.doi.org/10.1016/j.bica.2016.10.005
mentations. Equally importantly, transparency is an important         [6] N. Tinbergen, The Study of Instinct. Oxford, UK: Oxford University
                                                                          Press, 1951. [Online]. Available: https://books.google.co.uk/books?id=
consideration for users and operators of intelligent systems,             WqZNkgEACAAJ
particularly robots, and this is highlighted in the EPSRC             [7] N. Tinbergen and H. Falkus, Signals for Survival. Oxford: Clarendon
Principles of Robotics [18].                                              Press, 1970. [Online]. Available: http://books.google.co.uk/books?id=
                                                                          5LHwAAAAMAAJ
   The human brain does not run by virtue of some elegant             [8] J. J. Bryson, “The study of sequential and hierarchical organisation of
algorithm. It is a hack, built by the unseeing forces of                  behaviour via artificial mechanisms of action selection,” 2000, M.Phil.
evolution, without foresight or consideration for modularity,             Thesis, University of Edinburgh.
                                                                      [9] ——, “Intelligence by design: Principles of modularity and coordination
transparency or any other good design practice. If we are                 for engineering complex adaptive agents,” Ph.D. dissertation, MIT,
to build intelligent systems, the brain is not a good physical            Department of EECS, Cambridge, MA, June 2001, AI Technical Report
model from which we should proceed. Rather, we should look                2001-003.
                                                                     [10] D. C. Dennett, Kinds of minds: Towards an understanding of conscious-
at the behaviours of intelligent organisms, model the way in              ness. Weidenfeld and Nicolson, 1996.
which these organisms react, and then scale up these models          [11] J. J. Bryson, “A Role for Consciousness in Action Selection,” in
to build useful, manageable intelligent systems.                          Proceedings of the AISB 2011 Symposium: Machine Consciousness,
                                                                          R. Chrisley, R. Clowes, and S. Torrance, Eds. York: SSAISB, 2011,
   Whilst our Reflective Reactive Planner is a very simple                pp. 15—-20.
architecture, it does share many of the characteristics cited for    [12] J. W. Sherman, B. Gawronski, and Y. Trope, Dual-Process Theories
architectures that are worthy of evaluation, such as efficiency           of the Social Mind. Guilford Publications, 2014. [Online]. Available:
                                                                          https://books.google.co.uk/books?id=prtaAwAAQBAJ
and scalability, reactivity and persistence, improvability, and      [13] R. H. Wortham, S. E. Gaudl, and J. J. Bryson, “Instinct : A Biologically
autonomy and extended operation [19]. We hope that our                    Inspired Reactive Planner for Embedded Environments,” in Proceedings
work with reactive planners might strengthen the case for                 of ICAPS 2016 PlanRob Workshop, London, UK, 2016. [Online]. Avail-
                                                                          able: http://icaps16.icaps-conference.org/proceedings/planrob16.pdf
their consideration in situations where decision centric ‘glue’      [14] R. H. Wortham, A. Theodorou, and J. J. Bryson, “Robot Transparency
is required.                                                              : Improving Understanding of Intelligent Behaviour for Designers and
                                                                          Users,” in Proceedings of TAROS 2017. Guildford, UK: {accepted for
           C ONCLUSIONS AND F URTHER W ORK                                publication}, 2017.
                                                                     [15] D. C. Dennett, The intentional stance. MIT press, 1989.
  We have shown that a second-order Darwinian mind may
be constructed from two instances of the Instinct reactive             2 http://www.monarobot.uk/


        Proceedings of EUCognition 2016 - "Cognitive Robot Architectures" - CEUR-WS                                                  29
[16] P. O. A. HAIKONEN, “Consciousness and Sentient Robots,”
     International Journal of Machine Consciousness, vol. 05, no. 01, pp.
     11–26, 2013. [Online]. Available: http://www.worldscientific.com/doi/
     abs/10.1142/S1793843013400027
[17] J. H. Holland, Emergence: From Chaos to Order, ser. Popular science
     / Oxford University Press. Oxford University Press, 2000. [Online].
     Available: https://books.google.co.uk/books?id=VjKtpujRGuAC
[18] M. Boden, J. Bryson, D. Caldwell, K. Dautenhahn, L. Edwards,
     S. Kember, P. Newman, V. Parry, G. Pegman, T. Rodden, T. Sorell,
     M. Wallis, B. Whitby, and A. Winfield, “Principles of robotics,” The
     United Kingdom’s Engineering and Physical Sciences Research Council
     (EPSRC), April 2011, web publication.
[19] P. Langley, J. E. Laird, and S. Rogers, “Cognitive architectures:
     Research issues and challenges,” Cognitive Systems Research, vol. 10,
     no. 2, pp. 141–160, jun 2009. [Online]. Available: http://linkinghub.
     elsevier.com/retrieve/pii/S1389041708000557
[20] O. L. Georgeon and J. B. Marshall, “The small loop problem: A
     challenge for artificial emergent cognition,” Advances in Intelligent
     Systems and Computing, vol. 196 AISC, pp. 137–144, 2013.


         Proceedings of EUCognition 2016 - "Cognitive Robot Architectures" - CEUR-WS   30