A Role for Action Selection in Consciousness: An Investigation of a Second-Order Darwinian Mind Robert H. Wortham and Joanna J. Bryson Dept of Computer Science, University of Bath Claverton Down, Bath, BA2 7AY, UK Email: {r.h.wortham, j.j.bryson}@bath.ac.uk Abstract—We investigate a small footprint cognitive archi- From Ethology to Robots tecture comprised of two reactive planner instances. The first interacts with the world via sensor and behaviour interfaces. Following in-depth studies of animals such as gulls in their The second monitors the first, and dynamically adjusts its natural environment, ideas of how animals perform action plan in accordance with some predefined objective function. selection were originally formulated by Nico Tinbergen and We show that this configuration produces a Darwinian mind, other early ethologists [6], [7]. Reactions are based on pre- yet aware of its own operation and performance, and able to maintain performance as the environment changes. We identify determined drives and competences, but depend also on the this architecture as a second-order Darwinian mind, and discuss internal state of the organism [8]. Bryson [9] harnessed these the philosophical implications for the study of consciousness. We ideas to achieve a major step forwards with the POSH (Parallel use the Instinct Robot World agent based modelling environment, Ordered Slipstack Hierarchy) reactive planner and the BOD which in turn uses the Instinct Planner for cognition. (Behaviour Oriented Design) methodology, both of which are strongly biologically inspired. A POSH plan consists of a B IOLOGICALLY I NSPIRED C OGNITIVE A RCHITECTURES Drive Collection (DC) containing one or more Drives. Each Drive (D) has a priority and a releaser. When the Drive From the 1950’s through to the 1980’s the study of em- is released as a result of sensory input, a hierarchical plan bodied AI assumed a cognitive symbolic planning model for of Competences, Action Patterns and Actions follows. POSH robotic systems — SMPA (Sense Model Plan Act) — the most plans are authored, or designed, by humans alongside the well known example of this being the Shakey robot project design of senses and behaviour modules. An iterative approach [1]. In this model the world is first sensed and a model of the is defined within BOD for the design of intelligent artefacts — world is constructed within the AI. Based on this model and these are known as agents, or if they are physically embodied, the objectives of the AI, a plan is constructed to achieve the robots. goals of the robot. Only then does the robot act. Although this idea seemed logical and initially attractive, it was found to be quite inadequate for complex, real world environments. Kinds of Minds In the 1990’s Rodney Brooks and others [2] introduced the then radical idea that it was possible to have intelligence Daniel Dennett[10] elegantly outlines a high level ontology without representation [3]. Brooks developed his subsumption for the kind of minds that exist in the natural world. At the architecture as a pattern for the design of intelligent em- most basic level, the Darwinian mind produces ‘hardwired’ bodied systems that have no internal representation of their behaviours, or phenotypes, based on the genetic coding of environment, and minimal internal state. These autonomous the organism. The Skinnerian mind is plastic, and capable agents could traverse difficult terrain on insect-like legs, appear of ’ABC’ learning — Associationism, Behaviourism, Connec- to interact socially with humans through shared attention tionism. The Popperian mind runs simulations to predict the and gaze tracking, and in many ways appeared to possess effect of planned actions, anticipating experience. It therefore behaviours similar to that observed in animals. However, permits hypotheses “to die in our head” rather than requiring the systems produced by Brooks and his colleagues could them to be executed in the world before learning can take only respond immediately to stimuli from the world. They place. Finally the Gregorian mind (after the psychologist had no means of focusing attention on a specific goal or Richard Gregory) is able to import tools from the cultural of executing complex sequences of actions to achieve more environment, for example language and writing. Using these complex behaviours. Biologically inspired approaches are still tools enables the Gregorian mind, for example the human favoured by many academics, although a wide gap exists mind, to be self-reflective. between existing implementations and the capabilities of the However, perhaps the simple Darwinian mind might also human mind [4]. Today, the argument persists concerning be arranged to monitor itself, and in some small and limited whether symbolic, sub-symbolic or hybrid approaches are best sense to be aware of its own performance and act to correct suited for the creation of powerful cognitive systems [5]. Here it. Bryson suggests that consciousness might assist in action we concern ourselves more specifically with action selection selection [11], and here we investigate whether action selection as a core component of any useful cognitive architecture. achieved through reactive planning might parallel one of the Proceedings of EUCognition 2016 - "Cognitive Robot Architectures" - CEUR-WS 25 Fig. 1: Screen shot of the Instinct Robot World in operation. Each robot is represented as a single character within the display. Robots are labelled with letters and numbers to distinguish them. When a robot’s monitor plan becomes active the robot representation changes to the shriek character (!). The top right section of the screen is used to control the robots and the plans they use. The bottom right section displays statistics about the world as it runs. commonly accepted characteristics of consciousness; that is to future work where we may realise physical embodiment of be self-reflective and regulating [12]. this architecture. The Robot World allows many robots to be instantiated, Instinct and the Robot World each with the same reactive plan, or with a variety of plans. The robots each have senses to sense the ‘walls’ of the The Instinct Planner [13] is a biologically inspired reactive environment, and other robots. The reactive plan invokes planner specifically designed for low power processors and simple behaviours to move the robot, adjust its speed and embedded real-time AI environments. Written in C++, it direction, or interact with robots that it encounters within the runs efficiently on both A RDUINO and M ICROSOFT VC++ world as it moves. Most importantly for this investigation, each environments and has been deployed within the R5 low cost robot also has a second Instinct Planner instance. This planner maker robot to study AI Transparency [14]. monitors the first, and is able to modify its parameters based It’s unique features are its tiny memory footprint and on a predefined plan. efficient operation, meaning that it can operate on a low powered micro-controller environment such as A RDUINO. The Instinct Robot World provides statistical monitoring Alternatively, as in this experiment, many planners can run to report on the overall activity of the robots within the within one application on a laptop PC. world. These include the average percentage of robots that are The Instinct Robot World is a new agent based modelling moving at any one time, the average number of time units tool, shown in Figure 1. This is an open source project and all (ticks) between robot interactions, and the average amount code and configuration files are available online 1 . Each virtual of time that the monitor planner intervenes to modify the ‘robot’ within the Robot World uses an Instinct Planner to robot plan. We use the Instinct Robot World to investigate the provide action selection. Strictly, since these virtual robots are idea of Reflective Reactive Planning — one reactive planner not physically embodied, we should refer to them as agents. driving behaviour based on sensory input and predefined drives However, we have chosen to use ‘robot’ throughout, as intu- and competences, and another reactive planner monitoring itively these cognitive entities appear to be virtually embodied performance and intervening to modify the predefined plan of within the Robot World, and this choice of language seems the first, in accordance with some higher level objective. This more natural. In the final section of this paper we discuss simple combination of two Darwinian minds, one monitoring the other, might also be considered to be a second-order 1 http://www.robwortham.com/instinct-planner/ Darwinian mind. Proceedings of EUCognition 2016 - "Cognitive Robot Architectures" - CEUR-WS 26 Reflective Reactive Planning - A 2nd Order Darwinian Mind ROBOT #2 Plan Modifier Plan Manager Monitor Plan #1 Plan State Reactive Planner Plan Action Reactive Monitor Selection Planner Plan model Internal Robot Behaviour Sensor model State Library WORLD Fig. 2: Architecture of the second-order Darwinian mind. The robot is controlled by the Instinct Reactive Planner, as it interacts with the Sensor model and Behaviour Library. In turn, a second instance of Instinct monitors the first, together with the Internal robot state, and dynamically modifies parameters within the robot’s planner.The overall effect is a robot that not only reacts to its environment according to a predefined set of goals, but is also to modify that interaction according to some performance measure calculated within the Plan model. C ONJECTURES While the robot is in the ‘Interacting’ state it is shown as We expect that second-order Darwinian minds will out- a shriek character (!) within the Robot World display. Once perform first order minds when the environment changes, the robot has interacted its priority for interaction decreases, because the monitor planner is concerned with achieving but ramps up over time. This may be likened to most natural higher order objectives, and modifies the operation of the first drives, for example mating, feeding and the need for social planner to improve its performance. We also hypothesise that interaction. this architecture will remain stable over extended periods of time, because by restricting ourselves to the reactive planning The Monitor Plan is designed to keep the robot exploring paradigm we have reduced the number of degrees of freedom when it is overly diverted from social interactions. It achieves within which the architecture must operate, and previous this by monitoring the time between interactions. If, over three work shows that first-order minds produce reliable control interactions, the average time between interactions reduces architectures [14]. Finally, we expect that such a second-order below 1000 ticks, then the Monitor Planner reduces the priority system should be relatively simple to design, being modular, of the interaction Drive. After 1000 ticks the priority is reset to well structured and conceptually straightforward. its original level. We might use alternative intentional language here to say that the Monitor Planner ‘notices’ that the robot is M ETHODS being diverted by too many social interactions. It then reduces Figure 2 shows the Reflective Reactive Planning architecture the priority of those interactions, so that the robot is diverted implemented within the Instinct Robot World, and controlling less frequently. After some time the Monitor Planner ceases the behaviour of each robot within that world. The robot plan to intervene until it next notices this situation re-occurring. has the following simple objectives, each implemented as an Instinct Drive. The Robot World is populated with varying numbers of • Move around in the environment so as to explore it. robots (2, 3, 5, 10, 20, 50, 100, 200, 500, 1000), and for each • Avoid objects i.e. the walls marked as ‘X’ in Figure 1. number the experiment is run twice, once with a monitor plan, • Interact when another robot is ‘encountered’ i.e. when and once without. For each run, the environment is allowed another robot is sensed as having the same coordinates to run for some time, typically about 10 minutes, until the within the grid of the Robot World. This interaction reported statistics have settled and are seen to be no longer causes the robot to stop for 200 clock cycles or ‘ticks’. changing over time. Proceedings of EUCognition 2016 - "Cognitive Robot Architectures" - CEUR-WS 27 O UTCOMES Percentage of Time that Monitor Activated The results are most elegantly and succinctly presented as 60.00% simple graphs. Firstly, the average number of robots moving at 50.00% any one time within the world is shown in Figure 3. In both 40.00% Robots Moving in the World 30.00% 100.00% 90.00% 20.00% 80.00% 70.00% 10.00% 60.00% 50.00% 0.00% 1 10 100 1000 40.00% Robots 30.00% 20.00% Fig. 5: This graph shows the average percentage number of robots whose 10.00% monitor plan is activated at any one time, for a given number of total robots 0.00% in the world. Note the log scale. 1 10 100 1000 Robots No Monitor With Monitor The Instinct Robot World was found to be a stable, reliable Fig. 3: This graph shows the average percentage number of robots that are platform for our experiments, and the results it achieved were moving at any one time within the world, for a given total number of robots in the world. It can be seen that the addition of the monitor plan maintains repeatable. The application is single threaded, and so uses only more robots moving as the number of robots increases. Note the log scale for one core of the CPU on the laptop PC on which it was run. robots in world. Nevertheless, it was possible to simulate 1000 robots with both reactive planners active operating in the world at the rate of cases, as the number of robots within the world increases, 70 clock cycles (ticks) per second. the amount of time that the robot spends moving reduces. However the Monitor Planner acts to reduce the extent of D ISCUSSION this reduction from 60% to less than 20% over the full range of two to a thousand robots within the world. Similarly, in From the results we can see that by using a second Instinct Figure 4 we see that as more robots are introduced into the instance to monitor the first, we can achieve real-time learning world, the average time between interactions naturally reduces. within a tiny-footprint yet nevertheless symbolic cognitive However, the action of the Monitor Planner progressively architecture. In addition, since this learning modifies param- limits this reduction, so that with 1000 robots the time between eters from a human designed plan, the learning can be well interactions is almost trebled, from 310 to 885 ticks per understood and is transparent in nature. This contrasts strongly interaction. Interestingly, in both these graphs we see smooth with machine learning approaches such as neural networks curves both with and without the action of the monitor plan. that typically learn offline, are opaque, and require a much larger memory workspace. Despite the stochastic nature of Average Time Between Interactions the environment, the performance graphs show smooth curves 4000 over a wide range of robot populations. 3500 This relatively simple experiment also provides further fuel 3000 for the fire concerning the philosophical discussion of the nature of consciousness. Critics may say that when we use 2500 the intentional stance [15] to describe the behaviour of the 2000 Monitor Planner as ‘noticing’ something, we are merely using 1500 metaphor. They might argue that there is in fact no sentience 1000 doing any noticing, and in fact the only ‘noticing’ that is 500 happening here is us noticing the behaviour of this human 0 designed mechanism, which itself is operating quite without 1 10 100 1000 Robots any sentience and certainly without being conscious [16]. No Monitor With Monitor But that is to miss the point. We are not claiming that this architecture is conscious in the human or even significant Fig. 4: This graph shows the average time between robot interactions, both sense of the word, merely that our architecture is inspired with and without the monitor plan. The addition of the monitor plan reduces the variance in interaction time as robot numbers vary. Again, note the log by one aspect of how biological consciousness appears to scale. operate. However, having shown that this architecture can indeed provide adaptive control, and drawing on the knowl- The final graph, Figure 5 also shows a smooth, sigmoid like edge that gene expression produces behaviours which can increase in activation of the Monitor Planner as the number be modelled using reactive planning, we might also consider of robots increases, plotted on a logarithmic scale. whether consciousness in animals and humans may indeed Proceedings of EUCognition 2016 - "Cognitive Robot Architectures" - CEUR-WS 28 arise from complex hierarchical mechanisms. These mecha- planner. This architecture, which we call Reflective Reactive nisms are biologically pre-determined by genetics, and yet in Planning, successfully controls the behaviour of a virtual robot combination yield flexible, adaptive systems able to respond within a simulated world, according to pre-defined goals and to changing environments and optimise for objective functions higher level objectives. We have shown how this architecture unrelated to the immediate competences of preprogrammed may provide both practical cognitive implementations, and behavioural responses. This is not to argue for some kind of inform philosophical discussion on the nature and purpose of emergence [17], spooky or otherwise, but more simply to add consciousness. weight to the idea that the ‘I’ in consciousness is nothing more The Instinct Robot World is an entirely open source plat- than an internal introspective narrative, and such a narrative form, available online. We welcome those interested in agent may be generated by using hierarchical mechanisms that notice based modelling, cognitive architectures generally, and reac- one another’s internal states, decision processes and progress tive planning specifically, to investigate these technologies towards pre-defined (phenotypic) objectives. and offer suggestions for new applications and further work. We could certainly envisage a much grander architecture, One possibility might be to apply this architecture to the assembled at the level of reactive planners, using maybe Small Loop Problem [20], a specific challenge for biologically hundreds or thousands of planners each concerned with certain inspired cognitive architectures. objectives. Many of these planners may be homeostatic in We continue to develop robot applications for the Instinct nature, whilst others would be concerned with the achievement Planner, together with the Instinct Robot World. We are inves- of higher level objectives. We must remember that planners tigating the use of a small robot swarm to build a physically merely coordinate action selection, and say nothing about how embodied version of this experiment. To this end, we are sensor models may be formed, nor how complex behaviours currently working with the University of Manchester’s Mona themselves may be implemented. However, all dynamic archi- robot2 . tectures need some kind of decision centric ‘glue’ to bind them together, and reactive planning seems to be a useful candidate R EFERENCES here, as evidenced by practical experiment and biological [1] N. J. Nilsson, “Shakey the Robot,” SRI International, Technical Note underpinning. 323, Tech. Rep., 1984. Machine transparency is a core element of our research. We [2] C. Breazeal and B. Scassellati, “Robots that imitate humans,” Trends in have shown elsewhere [14] that reactive planners, particularly Cognitive Sciences, vol. 6, no. 11, pp. 481–487, 2002. the Instinct Planner, are able to facilitate transparency. This is [3] R. A. Brooks, “Intelligence Without Representation,” Artificial Intelli- gence, vol. 47, no. 1, pp. 139–159, 1991. due to the human design of their plans, and the ability to gather [4] A. V. Samsonovich, “Extending cognitive architectures,” Advances in meaningful symbolic information about internal system state Intelligent Systems and Computing, vol. 196 AISC, pp. 41–49, 2013. and decision processes in real-time as the planner operates. [5] A. Lieto, A. Chella, and M. Frixione, “Conceptual Spaces for Cognitive Architectures : A lingua franca for different levels of representation,” This ability to inspect the operation of the architecture may Biologically Inspired Cognitive Architectures, no. November, pp. 1–9, assist designers in achieving larger scale cognitive imple- 2016. [Online]. Available: http://dx.doi.org/10.1016/j.bica.2016.10.005 mentations. Equally importantly, transparency is an important [6] N. Tinbergen, The Study of Instinct. Oxford, UK: Oxford University Press, 1951. [Online]. Available: https://books.google.co.uk/books?id= consideration for users and operators of intelligent systems, WqZNkgEACAAJ particularly robots, and this is highlighted in the EPSRC [7] N. Tinbergen and H. Falkus, Signals for Survival. Oxford: Clarendon Principles of Robotics [18]. Press, 1970. [Online]. Available: http://books.google.co.uk/books?id= 5LHwAAAAMAAJ The human brain does not run by virtue of some elegant [8] J. J. Bryson, “The study of sequential and hierarchical organisation of algorithm. It is a hack, built by the unseeing forces of behaviour via artificial mechanisms of action selection,” 2000, M.Phil. evolution, without foresight or consideration for modularity, Thesis, University of Edinburgh. [9] ——, “Intelligence by design: Principles of modularity and coordination transparency or any other good design practice. If we are for engineering complex adaptive agents,” Ph.D. dissertation, MIT, to build intelligent systems, the brain is not a good physical Department of EECS, Cambridge, MA, June 2001, AI Technical Report model from which we should proceed. Rather, we should look 2001-003. [10] D. C. Dennett, Kinds of minds: Towards an understanding of conscious- at the behaviours of intelligent organisms, model the way in ness. Weidenfeld and Nicolson, 1996. which these organisms react, and then scale up these models [11] J. J. Bryson, “A Role for Consciousness in Action Selection,” in to build useful, manageable intelligent systems. Proceedings of the AISB 2011 Symposium: Machine Consciousness, R. Chrisley, R. Clowes, and S. Torrance, Eds. York: SSAISB, 2011, Whilst our Reflective Reactive Planner is a very simple pp. 15—-20. architecture, it does share many of the characteristics cited for [12] J. W. Sherman, B. Gawronski, and Y. Trope, Dual-Process Theories architectures that are worthy of evaluation, such as efficiency of the Social Mind. Guilford Publications, 2014. [Online]. Available: https://books.google.co.uk/books?id=prtaAwAAQBAJ and scalability, reactivity and persistence, improvability, and [13] R. H. Wortham, S. E. Gaudl, and J. J. Bryson, “Instinct : A Biologically autonomy and extended operation [19]. We hope that our Inspired Reactive Planner for Embedded Environments,” in Proceedings work with reactive planners might strengthen the case for of ICAPS 2016 PlanRob Workshop, London, UK, 2016. [Online]. Avail- able: http://icaps16.icaps-conference.org/proceedings/planrob16.pdf their consideration in situations where decision centric ‘glue’ [14] R. H. Wortham, A. Theodorou, and J. J. Bryson, “Robot Transparency is required. : Improving Understanding of Intelligent Behaviour for Designers and Users,” in Proceedings of TAROS 2017. Guildford, UK: {accepted for C ONCLUSIONS AND F URTHER W ORK publication}, 2017. [15] D. C. Dennett, The intentional stance. MIT press, 1989. We have shown that a second-order Darwinian mind may be constructed from two instances of the Instinct reactive 2 http://www.monarobot.uk/ Proceedings of EUCognition 2016 - "Cognitive Robot Architectures" - CEUR-WS 29 [16] P. O. A. HAIKONEN, “Consciousness and Sentient Robots,” International Journal of Machine Consciousness, vol. 05, no. 01, pp. 11–26, 2013. [Online]. Available: http://www.worldscientific.com/doi/ abs/10.1142/S1793843013400027 [17] J. H. Holland, Emergence: From Chaos to Order, ser. Popular science / Oxford University Press. Oxford University Press, 2000. [Online]. Available: https://books.google.co.uk/books?id=VjKtpujRGuAC [18] M. Boden, J. Bryson, D. Caldwell, K. Dautenhahn, L. Edwards, S. Kember, P. Newman, V. Parry, G. Pegman, T. Rodden, T. Sorell, M. Wallis, B. Whitby, and A. Winfield, “Principles of robotics,” The United Kingdom’s Engineering and Physical Sciences Research Council (EPSRC), April 2011, web publication. [19] P. Langley, J. E. Laird, and S. Rogers, “Cognitive architectures: Research issues and challenges,” Cognitive Systems Research, vol. 10, no. 2, pp. 141–160, jun 2009. [Online]. Available: http://linkinghub. elsevier.com/retrieve/pii/S1389041708000557 [20] O. L. Georgeon and J. B. Marshall, “The small loop problem: A challenge for artificial emergent cognition,” Advances in Intelligent Systems and Computing, vol. 196 AISC, pp. 137–144, 2013. Proceedings of EUCognition 2016 - "Cognitive Robot Architectures" - CEUR-WS 30